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I. INTRODUCTION 


The design of very large scale integrated (VLSI) circuits 
1s a very time consuming process. To reduce the time and 
cost required to design VLSI circuits various silicon 
compilers have been developed [Ref. 1]. One of these 
Compelers; =the MacPitts Silicon compiler, was developed at 
MIT Lincoln Laboratory in 1981-1982 [Refs. 2 and 3]. 

The MacPitts silicon compiler is a large and complex 
computer program that frees the circuit designer from having 
to worry about the details of the actual design and layout 
of the circuit. From a short program (usually less than 
fifty lines) that contains a functional description of the 
desired circuit, MacPitts completely designs an implementa- 
tion of the VLSI chip and outputs a file in Caltech 
Intermediate Form (CIF) that describes the circuit. The CIF 
file can be used to perform a functional simulation or a 
timing analysis of the circuit. After verifying the 
functional correctness of the circuit the CIF file can be 
2602 00 5 silicon foundry so that the circuit can be 
fabricated. 

The MacPitts compiler has been used previously at the 
Naval Postgraduate School by Carlson [Ref. 4] to design a 


Fie inne multiplier ċircuit. 23721856255 thesis contains 58 


۳1 


description of the MacPitts language, which is used to write 
the .mac program that contains a functional description of 
the circuit to be designed by the compiler. Also, a 
detailed procedure on how to write the .mac program is 
given. Carlson also shows how to use the MacPitts inter- 
preter to test the functional correctness of the .mac program 
before the circuit design is performed by the compiler. In 
addition, Carlson's thesis gives a detailed listing of the 
activities in the MacPitts design cycle used to design VLSI 
circuits. The design cycle includes generating the .mac 
program, submitting the .mac program to the compiler for 
circuit design and performing a design rule check and 
functional event simulation on the designed circuit. 

Since a good understanding of how to use the MacPitts 
Silicon compiler to design VLSI circuits was obtained by 
Carlson [Ref. 4] it was decided that the next logical step 
was to learn more about the MacPitts architecture and to 
make some performance comparisons between various MacPitts 
and hand-crafted designs. The first goal of this thesis 
research was to determine what basic building blocks the 
compiler used to design VLSI circuits and how these building 
blocks are used to implement different circuits. Also, an 
understanding of how the statements in the .mac program 
determine the structure of the MacPitts designed circuit was 


desired. 
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It was decided to use the MacPitts compiler to design 
various adder circuits so that performance (chip size, power 
and speed) comparisons could be made between the MacPitts 
designs and a hand-crafted pipeline adder circuit designed 
200:1:3d1 and Hauenstein [Ref. 5]. Crystal, the VLSI 
timing analysis program developed at the University of 
California at Berkeley [Ref. 6], was used to analyze the 
timing requirements of all circuits being compared. Since 
Crystal had never been used at the Naval Postgraduate School 
before, a procedure on how to use Crystal to analyze MacPitts 
designs had to be developed. This required adapting the 
basic Crystal Commands to the unconventional MacPitts 
three-phase overlapping clock scheme. 

The third research goal was to obtain a more complete 
understanding and description of the MacPitts interpreter 
than currently available in the literature. Reference 2 
and reference 4 describe how to use the interpreter, but a 
detailed description of the interpreter commands and error 
Etlmements and its capabilities and limitations is not 
available. 

Chapter II of this thesis describes the basic circuit 
pu 077 Dlocks of the MacPitts compiler. The design of 
several combinational and pipeline adder circuits is 
presented in Chapter III. Chapter IV lists performance 


comparisons between the MacPitts adders and the Conradi and 
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Hauenstein adder [Ref. 5] along with a tutorial on Crystal. 
Design errors that have been found in MacPitts designs are 
detailed in Chapter V. Tutorial material on the MacPitts 


interpreter is found in Appendix A. 
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II. THE MACPITTS DESIGN ARCHITECTURE 


A. INTRODUCTION 

The MacPitts design structure consists of five main 
components. They are the chip design frame with pads, the 
data-path, the sequencer, the Weinberger array and the flags 
block (see Figure 2.1). Input ports or signals are used to 
bring input data into the chip and output ports or signals 
are used to output data from the chip. The difference 
between ports and signals is that a port has as many bits as 
the data word defined by the programmer in the MacPitts .mac 


program and a signal is only a one-bit data element. 


B. THE DESIGN FRAME 

- The MacPitts compiler was designed to have no limit on 
the size of a circuit that it would design although large 
circuits may take several days of computer time to be 
completed. The design constraints that must be used for 
practical designs are the MOSIS chip size and pad number 
fabrication limitations. The current MOSIS limitation for 
the chip size is 7900 x 9200 microns and the maximum number 
of pads is 84. 

All pads are defined in the ''def' section of the MacPitts 

.mac program and are placed around the chip in the order 


specified in the program starting in the upper lefthand 
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Corner Of the chip and continuing in a clockwise direction. 
The version of the MacPitts silicon compiler installed at the 
Naval Postgraduate School will not place pads on the left edge 
of the chip. A newer version of the compiler that is 
avallabie commercially places pads on all four sides of the 
chip. All output pads are super buffered but the input data 
and clock pads are not. 

Along with the ground and power pads, the three-phase 
clock pads must also be defined in all MacPitts programs 
even though the clock may not be used in the circuit. The 
clock bus is always laid out on the chip. The MacPitts 
compiler uses a three-phase overlapping clock scheme where 
the clock period is divided into five segments as shown in 
Figure 2.2. This unusual clock scheme is used to drive the 
data storage registers and flaġs (see paragraphs C and E 
below) and according to (Ref. 4) allows a more compact 
lavout of the registers and flags. 

A reset pad must also be defined if the 'process' form 
is used in the .mac program even if the reset function is 
not used anywhere in the program. This is because the 
MacPitts compiler mav use the reset signal in its internal 
algorithms when it generates the chip [Ref. 2]. If the 
"always" form is the only form used in the .mac program the 


reset pad is not required. 
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Figure 2.2. MacPitts Five Segment Clock Period 
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C. THE DATA-PATH 

The data-path is the unit where all word size operations 
are performed. These operations consist of arithmetic 
functions (addition, subtraction, incrementing, decrementing 
and equals), boolean functions (and, or, not, nand, nor, 
xor and equ), data shifting operations, comparison tests and 
data storage and transfer using registers and ports [Refs. 2 
and 3]. The structure consisting of a one-bit slice of the 
above operations is referred to as an organelle and the LISP 
code used by the MacPitts compiler to generate each organelle 
can be found in the library and organelle sections of the 
MacPitts source code listing. 

The size of the data-path is determined by the number 
of bits in the data word (specified at the beginning of the 
.mac program) and the number of word size operations to be 
performed. The number of bits in the data word specifies 
the height of the data-path. The larger the data word the 
taller the data-path. The width is determined by the number 
of functions performed. When a specific function is to be 
performed in the data-path the organelle that performs that 
function is placed in the data-path. Replicas of that 
organelle (one for each bit of the data word) are stacked on 
top of each other. The organelle for the most Significant 
bit of the data word is on the bottom of the stack and the 
organelle for the least significant bit is on the top. The 


ordering of the organelles in the data-path is determined 


0 


by the ordering of the word operations specified in the .mac 
program. The first word operation encountered by the 
compiler in the .mac program is the first organelle in the 
data-path and so on. The compiler takes into account the 
size requirements of each organelle to scale the amount of 
space between organelles to allow enough room for connection 
lines, control lines, power lines, and local interconnection 
buses. Power and ground buses are also sized based on 
organelle power requirements [Ref. 3]. 

The routing of data to and from the data-path is very 
inefficient and requires many data lines to be longer than 
necessary. As seen in Figure 2.1, the chip pads are placed 
only on the “top, right and bottom sides of the chip. Data 
entering the chip on input ports and exiting the chip 
through output ports is routed from the left side of the 
data-path. This causes very long data lines. Data from the 
data-path to the Weinberger array is routed from the bottom 
side of the data-path to the top side of the array. 

All arithmetic and boolean function organelles are 
implemented using three basic gate structures. They are the 
NAND, NOR and inverter. Figure 2.3 shows an AND organelle 
that is made from a NAND gate and an inverter. In Figure 2.4 
an XOR function is implemented using NAND gates. An OR gate 
is implemented from a NOR gate and an inverter and the 


boolean EQU function is implemented using four NOR gates. 
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Figure 2.4. »kxclusive-OR Organelle Structure 


The one-bit full adder circuit in Figure 2.5 shows how 
a complex function can be implemented bv putting several 
different organelles together. In this case two XOR 
organelles and one NAND organelle are used. Figure 2.6 
shows how the one-bit adder is used bv MacPitts to build a 
two-bit full adder circuit with carry in. 

Two different structures are used in the data-path to 
define and store organelle inputs and outputs. They are the 
internal port and the register, and both are the same size 
as the data word. Internal ports are used primarily to 
transfer the output of an organelle to another organelle or 
to the Weinberger array within the same clock cycle or state 
period [Ref. 2]. Registers are used to store word size data 
elements. A one-bit register organelle consists of a 
master-slave flip-flop, as shown in Figure 2.7, that is 
controlled by the MacPitts three-phase clock [Ref. 3]. This 
structure allows the output of the register to be valid 
during a clock cycle even though a new input value could be 
in the process of being clocked into the register. The 
enable line in Figure 2.7 is used to control which clock 
cycles the register samples the input line for data storage. 
A memory refresh cycle is performed if new data is not 
stored during a clock cycle. If data is to be stored in 


every clock cycle the enable line is connected to Vdd. 


2 


۱1 ۲500 ۰00۷ Tin S32:14504N 6 5۵ BJ ۱ 


TAOD 


LNO 





851۰۱71000 dex NIƏ Ee Mcd 


CARRY IN 








bit @ AIN 


bit B BIN 


bit i AIN 


bit 1 BIN 


CARRY OUT 


Figure 2.6. 2-Bit Full iddertemrenit 


Ola 


2.۲ 


l-Bit Register Cell 


IN 


eet. 


s. 
mül” 


Ticürc 2.7: 


D. THE SEQUENCER 

The sequencer is a mini data-path and is placed on the 
chip between the data-path and the flags block. If the .mac 
program contains a process whose value depends on the system 
state, a sequencer is placed on the chip to control the 
system state of the chip. The sequencer usually contains 
registers to store the current system state. Every clock 
cycle the current system state is transferred to the 
Weinberger array from the registers and then the new system 
state is transferred from the Weinberger array to the 
sequencer for storage. Additional details about the 


sequencer are given in [Refs. 3, 7 and 8]. 


E. [HE FLAGS BLOCK 

Flags have a similar function to registers, but they 
store only one-bit of data from the Weinberger array. Flags 
also have a master-slave flip-flop structure but extra 
inverters are used in the flags block to drive the clock 
signals because there may be as many as twenty or thirty 
flags in the flags block (see Figure 2.8). The enable line 
of a flag performs the same function as the enable line of a 
register. Flags are placed side by side with the flags block 
increasing in width as more flags are needed. The rightmost 
structures in the flags block are the six inverters used to 
drive the three clock lines (see Figure 2.9). The leftmost 


flag in the flags block is the first flag encountered by the 
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compiler in the "always" or "process" section ов the .mac 
program. Each subsequent flag encountered by the compiler is 
۳۲۰۰۰۱ ده‎ the tight of the Previous #lag. Simce the flags 
block cannot expand in the vertical direction there is 

wasted space on the chip above the flags block if the 
data-path or sequencer is taller than the flags block. Also, 
if the .mac program requires a large number of flags the 
width of the flags block mav make the dimension of the chip 


exceed the MOSIS chip size constraints. 


F. THE WEINBERGER ARRAY 

ive Weinberger array, or control unit; or a MacPaitts 
designed chip is the unit where all chip control signals are 
generated and bit size boolean functions are performed. All 
inputs and outputs to the array are routed to the top of the 
array. Input and output signal lines are routed around the 
left side of the array and then to the top. 

The data lines connecting the Weinberger array to the 
data-path, sequencer, and flags block are called the "river". 
The algorithm that routes the "river" does not allow the 
data lines to cross each other so the left-to-right ordering 
of the functions performed in the array is determined by the 
i 32-1 ألت‎ ordering of the data transferred from the 
data-path, sequencer and flags block to the array. Array 
functions that use data from the data-path are placed in the 


left section of the array, array functions that use data from 
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the sequencer are placed in the center section of the array 
and array functions that use data from the flags block are 
placed in the right section of the array. Since no data 
lines in the "river" can cross each other data that is 
transferred between the data-path, sequencer or flags block 
must pass through the array even though no function is 
performed on the data in the array. 

The Weinberger array consists of a regular structure of 
NOR gates having arbitrary numbers of inputs. The pull-up 
transistors of the NOR gates are connected to Vdd at the 
bottom of the array and run vertically the full height of 
the array. Vertical ground wires run parallel to the pull-up 
transistor lines from the ground bus at the top of the 
array. Inputs to the NOR gates run horizontally through the 
array and form pull-down transistors when connected to ground 
and the NOR gate output line. The NOR gate output lines also 
run horizontally through the array and may be used as input 
lines to other NOR gates or routed to a flag or signal output 
pad. As more NOR gates are added to the Weinberger array or 
more inputs or outputs are added to each gate the array 
increases in width. The height of the array is determined 
by the number of horizontal interconnections between the NOR 
gates [Ref. 7]. 

Eight different boolean functions are implemented in the 
Weinberger array, all with NOR gates: NOR, AND, NAND, OR, 


EQU, XOR, parity and NOT [Ref. 2]. Figure 2.10 shows how an 


~ 
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۱۳۱۳ ۱ JO. NOR Gate Implementation of the KOR Function 


XOR function is implemented using NOR gates. The stick 
diagram of Figure 2.11 shows the Weinberger array implemen- 
tation of the XOR function from Figure 2.10 and Figure 2.12 
shows an actual Weinberger array layout of this function. 
The PLA and Weinberger array structures are very 
similar but there are several important differences. First, 
the PLA has only two levels of logic, the AND and the OR 
planes. The Weinberger array can have an arbitrary NOR gate 
depth. Although a PLA can implement the same functions 
performed in the Weinberger arrav the MacPitts designers 
found that when a boolean function was normalized in the 
sum-of-products form the Weinberger arrav's NOR gate depth 
allowed a much more compact structure than the PLA's 
[Ref. 3]. Another difference is that the complement of each 
input signal does not have to be available at the input of 
the Weinberger array as a PLA requires. The complements of 
array inputs are generated in the array if they are required. 
It has been found that the generation of the Weinberger 
array usually takes from 90% ~ 95% of the computer's 
compilation time in generating a MacPitts design. When an 
8-011 5-stage pipeline adder was designed using the 
MacPitts compiler 162 CPU minutes (about eight hours on a 
lightly loaded computer system) were required to complete this 
design. Most of this time was required to lay out the 228 


vertical control columns (the number of array inputs and 
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Weinberger Array Layout of an XOR Function 


۳۱۳۱۳ 2.12. 


АМҸ‏ و حو فح ودنن ححح وکت 


Дик [ከ 


mu 5 E 


و ኤ.ብ E‏ ۵د داد نت 
RI‏ 1 
ASLI‏ 


SS ki 


ҺА 


m. 


ü 
9 


i 


d‏ اج 


SESS 


DBA SAABAASS SAAS TSS ASAT AS SSS SSS DY 
“ua ona uğ a ə ə ፡-ኤ“ዱ ኡ 


a E Si 


ch aha Dao ddd REED aw BEAL SLaE SEE RS DED) 


SS ASA SASAASSS, 


: = | ሜኽ-ኤ.ኤ.ጤ.ኬ.ኤ.ኤ.ሕ. ኤ.ሴ. ኬ.ኤ.ኤ..-ኹ እኮ. ኤ-ሕ.ኤ.ኤ.፡. 3. ጆቫ Bl 


كذ 5 سوواط 


A 
ws 
80 
7 SE 2 
a 6.1.1. 8 
Ди. [23 X 
ОЛА 
ħi L 
H 


al ESPON A TL RE SAS SASS SAAS SAS 35 ok X. 1 


Qorx 
“o 


— nı 
m9 ام‎ 
2۳ 1 HH 
EU HAS .И2п1 
e SE fit sen 
pe اس‎ 
ии 0 h ки ии 
HH U HSU | LH HH HUES 
LH USE ሚሜ 00...” 
HHHH ARR ИЛ 1 LH HB 
BIN TE ya 
2 


1 ۳ 


۹د ھ, سی 


99 
<4 | - 


7 RAE له‎ s ር 

4d psu HA A 9 d 2 

4 00.5.) x SE 
HH HEU HHA fi i ta A A L 

src cr (۵ 3‏ جع سس 
no. á , H U‏ 


፡ 


ሀ 
vere ው? 
ፃመ" መ 
———u 


: تا‎ 
LW әФф 
‹ Ka 
AA < 


لبيك 
t‏ 





34 


outputs plus the number of nor gates in the array) and the 

81 horizontal control tracks (the number of nor gate inputs 
and outputs in the array). When a 16-bit 5-stage pipeline 
adder design was attempted which contained 435 control 
columns and 157 control tracks the design process was killed 
after 4800 CPU minutes (four days) were spent designing the 
Weinberger array. When the size of the Weinberger array of a 
A-bit 5-stage pipeline adder (126 columsn and 43 tracks) is 
compared with the size of the 8-bit and 16-bit adders it 

can be seen that the Weinberger array becomes nearly four 
times larger and more complex in this 5-stage pipeline design 


when the size of the data word is doubled. 
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III. THE DESIGN OF ADDER CIRCUITS 


A. COMBINATIONAL ADDERS 

The design of combinational adder circuits with the 
MacPitts compiler is more straightforward than the design of 
pipeline adder circuits. The output sum of a combinational 
adder depends only on the present inputs to the circuit. 
Unfortunately, several compiler design constraints cause the 
combinational adder design to be more complicated than 
necessary. 

The compiler adds two input vectors (ain and bin) in 
the data-path using the ripple carry full adder circuit 
shown in Figure 2.5. The first problem occurs when trying to 
add the input carry (cin) to the first Dit © ain and bin. 
Since cin is a one-bit sized data element and the data-path 
can only manipulate word size data elements cin must be 
converted to a word sized data element. This requires 
additional circuitry in the data-path and the Weinberger 
array and also additional statements in the MacPitts .mac 
program. 

A second problem occurs because the MacPitts language 
in which the .mac program is written allows only two 
variables in the addition function [Ref. 2]. ۸11 MacPitts 
functions are limited to one or two variables. It is assumed 


that the number of variables in a MacPitts function was 
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limited by the compiler designers to simplify the design of 


the compiler. “The simple LISP addition function of 
(CT ain bin cin) 


is accomplished in MacPitts with the more complicated 


function 
Ain bin cin) 2 


This embedded addition causes tvo full adder circuits to be 
connected in cascade. In the first full adder bin is added 
to cin and this sum is added to ain in the second full adder. 

Aimthird problemWis that the carry in and carry out 
lines of the full adder cannot be addressed by the 
programmer. They are only used to ripple the carry bits 
between full adder stages. The carry in of the bit O full 
adder is connected to ground; the carry out of the last 
full adder stage is not connected to anything. If a chip 
carry out is desired it must be generated by additional 
Circuitry in the Weinberger array. 

Figure 3.1 shows a block diagram of the data-path for a 
two bit combinational adder circuit with carry in. In 
Figure 3.2 the .mac program for a 4-bit combinational adder 
is shown. Lines 14 and 15 convert the carry in signal to a 
word size data element. The least significant bit of the 
carry in word is set to 1 or O depending on the value of the 


carry in signal. All other bits of the carry in word are 
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1 ;adder 4-bit combinational 
2 (program add 4 
3 (def 1 ground) 


4 (def ain port input (2 3 4 5)) input vector 
5 (def bin port input (6 7 8 9)) input vector 
6 (def res port output (10 11 12 13)) ;output vector 
7 (def cin signal input 14) carry in 

8 (def carry port internal) 


9 (def 15 phia) 
10 (def 16 phib) 
11 (def 17 phic) 
12: (def 18 power) 


13 (always 
14 (cond (cin [setq carry 1)) 
15 (t (setq carry 0))) 


16 (setq res (+ ain (+ bin carry))))) 


eure € TT bit Combinational sddereemac Program 


set to O regardless of the value of the carry in signal. 
This can be seen on the left side of Figure 3.1. Figure 3.3 
and 3.4 show the 4 micron MacPitts designs for a 4-bit and an 
8-bit combinational adder. The size of the 4-bit adder is 
2.292mm x 2.398mm and the size of the 8-bit adder is 3.508mm 
x 3.614mm. As shown, the size of the chip frame is larger 
than required by the circuitry inside the chip. A larger 
frame is needed because pads can be placed only on three 
Sides of the frame. The frame could be smaller and the chip 
area could be more effectively used if pads were placed on 
all four sides of the frame. Both of these MacPitts designs 
produced correct simulations when simulated by the event 
driven switch level simulator, esim, using the procedure 


outlined in [Ref. 4]. 


B. PIPELINE ADDERS 

The purpose of pipelining a circuit is to increase the 
throughput of the circuit. The combinational logic of a 
circuit is partitioned into several smaller functional units 
or stages and storage registers are placed between each 
stage. During each clock period data is clocked from the 
input storage register of each stage through the 
combinational logic of the stage and into the output storage 
register of the stage. Also, during each clock period a 
result exits the pipeline. Since the combinational logic in 


each stage of the pipeline circuit is less than the total 
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Figure 3.3. MacPitts Design of a 4-Bit Combinational Adder 
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Figure 3.4. MacPitts Design of a 8-Bit Combinational Adder 
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combinational logic of the combinational ciremsse the 
pipeline circuit has a shorter logic propagation delay 
during each clock period. This allows the pipeline circuit 
to operate at a faster clock speed and higher data output 
Hoea throughput) than the combinational circuit. ል 

a advantage of a pipeline circuit is the latency caused by 
the time that is required to fill and empty the pipeline. 
Reference 9 should be consulted for more information on 
pipelining. 

There are many different algorithms that can be used to 
design a pipeline adder circuit. The block carry-look-ahead 
(BLCA) addition algorithm [Ref. 10] was used so that a 
comparison could be made between a MacPitts designed 
pipeline adder circuit and the hand-crafted pipeline adder 
circuit designed by Conradi and Hauenstein [Ref. 5]. 
Equations 6.1 thru 6.12 of [Ref. 5] are used to implement 
the BCLA addition algorithm. As described in İRef. o), the 
BLCA pipeline adder can be conveniently divided into the 
following five stages: 

1. Calculate the carry generate 000 and the 32237 


propagate (Ру) from the input addition operands. 


A.B 
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il 30007 
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Calculate one block generate o) and block propagate 
Er for every four bits of the addition operands 


from the Gi's and pis. 


Bo 170,123 60 : 22.1 


SZ. = ሦ ተው 14+ተ2 1+ጊ 1 
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001020 0: 127 


Calculate the block carry (BC, ) for each carry block. 


BC. = BG *С (ВР, 

BC, = 86+806 ВР *С jBPQgBP, 

BC); = BG.+BGJ BP. +BG OBP BP.+C_ BP oBP BP , 

DBC < = 8073ተ8035፻ን+86 BP BP o+BG O BP BPO BPo+C_J BP BP, BP BP. 


Calculate the look-ahead-carry (C.) for each bit of the 


operands. 
Соав 127 0,4 8120 2-1 3 7 11 ии. 
Ci 5,9,13 “1,5,9. 13 0,1,2 TE RRE 
+BC 
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777: :)culate the sum bits (31). 


S; = (A,)XOR(B,)XOR(C, .) 


l 

The Conradi and Hauenstein (Ref. 5) pipeline adder had 
only four stages. Stages l and 2 were combined by writing 
the equations describing the s and —- in terms of the 
input operands instead of in terms of the G.'s and pis. The 
MacPitts pipeline adders contain five stages because the 
increased stage propagation delay caused by combining stages 
1 and 2 could slow the clock speed of the circuit and the 
fastest possible clock speed is desired. 

Figure 3.5 shows the .mac program for a 4-bit S-stage 
pipeline adder circuit. The carry in of the chip is used in 
all stages of the pipeline so a separate storage location is 
required for each stage as shown in lines 26 thru 29. The 
carry propagate and carry generate calculated in stage l are 
used in stages 4 and 5, respectively, so multiple storage 
locations are also used for these quantities. The 


calculations of the carry generate and carry propagate, 
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1 (program addp 4 
2 ;This adder uses block carry lookahead (BCLA) addition 


(def 1 ground) 


(def ain port input (2 3 4 5)) ;input vector 
(def bin port input (6 7 8 9)) ;input vector 


‘carry into chip 
‘bit 3 sum 

‘bit 2 sum 

;bit 1 sum 

;bit 0 sum 

;carry out of chip 
;clock phases 


carry propagate-stage one 
: -stage two 
-stage three 
-stage four 
"carry generate-stage one 
i -stage two 

; -stage three 
;block carry propagate 
;block carry generate 
;block carry 

;cin-stage one 

; -Stage two 

; -stage three 

; -stage four 

;bit O carry 

;bit 1 carry 

;bit 2 carry 
;bit 3 carry 

scout flag 

;bit sum flags 


-mac Program for a 4-Bit o-Stage 


(def cin signal input 10) 
(def sum3 signal output 11) 
(def sum2 signal output 12) 
(def sum] signal output 13) 
(def sumO signal output 14) 
(def cout signal output 15) 
(def 16 phia) 

(def 17 phib) 

(def 18 phic) 

(def 19 power) 

(def pl register) 

(def p2 register) 

(def p3 register) 

(def p4 register) 

(def g1 register) 

(def g2 register) 

(def g3 register) 

(def bpO flag) 

(def bgO flag) 

(def bc3 flag) 

(def carrvi flag) 

(def carrv2 flag) 

(def carry3 flag) 

(def carry4 flag) 

(def cO flag) 

(def c1 flag) 

(def c2 flag) 

(def c3 flag) 

(def carryout flag) 

(def addO flag) 

(def add! flag) 

(def add2 flag) 

(def add3 flag) 


(alwavs 


3 


“1 & OF ጩጨ 


Oo 


9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
əl 
32 
33 
34 
35 
36 
37 
38 
39 
40; 
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42 ; 


Figure onos 


43 (par (setq pl (word-xor ain bin)) 


44 (setq gl (word-and ain bin)) 

45 (cond (cin (setq carry 1 t)) 

46 (t (setq carry] f)))) 

47 ; 

48 ;Stage Two 

49 ; 

50 = (par (setq bpO (and (bit 3 p1) (bit 2 pt) (bit 1 pl) (bit O p1)J) 
51 (setq bgO (or (bit 3 g1) (and (bit 2 g1) (bit 3 p1)) 
52 (and (bit 1 g1) (bit 2 pl) (bit 3 p1)) 

53 (and (bit O gl) (bit 1 pl) (bit 2 pl) (bit 3 p1)))) 
54 (setq p2 pl) 

55 (setq g2 gl) 

56 (setq carry2 carry1)) 

50 

58 :Stage Three 

59 ; 

60 (par (setq bc3 (or bgü (and carry2 bpü))) 

61 (setq p3 p2) 

62 (setq g3 g2) 

63 (setq carry3 carry 2)) 

64 ; 

65 ;Stage Four 

66 ; 

67 (par (setq cO (or (bit O g3) (and carry3 (bit O p3)))) 
68 (setq c1 (or (bit 1 g3) (and (bit O g3) (bit 1 p3)) 
69 (and carry3 (bit O p3) (bit 1 p3)))) 

70 (setq c2 (or (bit 2 g3) (and (bit 1 g3) (bit 2 p3)) 
71 (and carry3 (bit 0 p3) (bit 1 p3) (bit 2 p3)))) 
72 (setq p4 p3) 

13 (setq c3 bc3) 

14 (setq carry4 carry 3)) 

og 

76 ;Stage Five 

иј: 

78 (par (setq add0 (xor (bit 0 p4) carry4)) 

79 (setq addı (xor (bit 1 p4) cÜ)) 

80 (setq add2 (xor (bit 2 p4) c1)) 

81 (setq add3 (xor (bit 3 p4) c2)) 

82 (setq carrvout c3) 

83 (setq sumO add0) 

84 (setq sum] add1) 

85 (setq sum2 add2) 

86 (setq sum3 add3) 

87 (setq cout carrvout)))) 


Figure 3.5. MacPitts .mac Program for a 4-Bit 5-Stage 
Pipelinem ddem €rirenmit (cont.) 
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shown in lines 43 and 44, are performed with the word-xor 
and the word-and functions in the data-path. All functions 
in stages 2 thru 5 require the manipulation of bit size data 
elements. These functions are performed in a large 
Weinberger array. Registers and flags are used to store the 
input and output data of each stage. It takes less circuitry 
(and fewer statements in the .mac program) to manipulate 
word size data elements and store them in registers than to 
manipulate bit size data elements and store them in flags. 
Since there is no MacPitts function to set the bits of a 
word to a particular value the bit sized output data 
elements of stages 2 through 4 cannot be combined into words 
and stored in registers. The output data of stages 2 through 
4 must be stored in flags and this requires a very large 
flags block. 

A pipeline circuit designed by the MacPitts compiler 
does not perform like a standard pipeline circuit as 
described in [Ref. 9] because the input data of each stage 
is valid before the start of the clock period. When data is 
stored in a MacPitts register or flag the data is valid on 
the register or flag output line before the end of the clock 
period (see Figures 2.2, 2.7, and 2.8). The data then starts 
to propagate through the combinational logic of the next 
stage before the start of the next clock period. During the 
next clock period the data will continue to propagate 


through the stage combinational logic during the first two 
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clock period segments and will then be stored in registers 
or flags during the third clock period segment. 

Since the output data of stage four of the pipeline 
5121112 starts propagating through the logic ofestage=fiwe 
before the start of the next clock period it is possible 
that the data could propagate all the way through the logic 
of stage 5 and reach the output pads of the chip at the 
beginning or before the start of the next clock period. This 
causes the fifth stage of the pipeline to De essentially 
part of the fourth stage. For comparison purposes with the 
Conradi and Hauenstein adder [Ref. 5] it is desired that the 
fifth stage of the circuit be separate from the fourth. To 
insure that there are five stages in the pipeline instead of 
four the sum bits and carrv out bit of the adder operands 
ህከ ር! ተኝ51.18 ኖርርፎር፡፡.፡.]ስ]፡፡ሽ ከ፡!፡ ህህ...) are stored in flags 
and then the outputs of the flags propagate to the output 
pins (lines 83 thru 87) during the end of the clock period. 

The layout of a 4 micron 4-bit 5-stage pipeline adder, 
which uses only one carry-look-ahead block, is shown in 
Figure 3.6. Figure 3.7 shows a block diagram of this 
Gircuit. The size of the circuit is 4.828mm x 2.918mm. As 
shown, the circuit has a large Weinberger array and flags 
block. The long flags block takes up almost half the width 
of the chip and causes a lot of wasted area on the right of 
the Weinberger array. An 8-bit 5-stage pipeline adder will 


contain two carry-look-ahead blocks. The .mac program of the 
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8-bit adder is shown in Figure 3.8 and the circuit layout is 
shown in Figure 3.9. The block diagram of the 8-bit adder 
would be the same as the block diagram of the 4-bit adder 
shown in Figure 3.7. The size of the 8-bit adder circuit is 
6.650mm x 4.358mm. The data-path is twice as tall, the flags 
block is almost twice as long and the area of the Weinberger 
array is four times larger in the 8-bit adder than in the 
4-bit adder. 

An attempt was made to design a 16-bit 5-stage pipeline 
adder with the MacPitts compiler. The compiler was able to 
design all but the large Weinberger array which is four 
times larger than the 8-bit adder array. The program that 
designs the Weinberger array uses a recursive algorithm and 
the depth of recursion is limited by the amount of memory 
available to the LISP compiler. Since the array of the 
16-bit adder circuit is so large the limit of the depth of 
recursion was reached. 

The 16-bit pipeline adder contains four carrv-look-ahead 
blocks. When the .mac program of the 16-bit adder (Figure 
3.10) is compared to the .mac programs of the 8-bit and 
4-bit adders (Figures 3.6 and 3.9) the programs are 
essentially the same except for additional statements in 
stages 2 through 5 due to the larger 16-bit data word and 


due to the additional carry-look-ahead blocks. 
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1 (program addp 8 
2 ;This adder uses block carry lookahead (BCL A) addition 


(def 1 ground) 

(def ain port input (2345 6 7 8 9)) ;input vector 
(def bin port input (10 11 12 12 14 15 16 17)) ;input vector 
(def cin signal input 1&) carry into chip 
(def sum? sıgnal output 19) :bit 7 sum 

(def suin6 signal output 20) ;bit 6 surn 

(def sum5 signal output 21) ;bit 5 sum 

(def sum4 signal output 22) ;bit 4 sum 

(def sum3 signal output 23) :bit 3 sum 

(def surn? signal output 24) ;bit 2 sum 

(def sum] signal output 25) ;bit 1 sum 

(def sumO signal output 26) ;bit O sum 

(def cout signal output 27) ‘carry out of chip 
(def 28 phia) clock phases 

(def 29 phib) 

(def 30 phic) 

(def 31 power) 


(def pl register) ‘carry propagate-stage one 
(def p2 register) -stage two 
(def p3 register) : -stage three 
(def p4 register) -stage four 
(def g1 register) ;carry generate-stage one 
(def g2 register) ፡ -stage two 
(def g3 register) ; -stage three 
(def bpO flag) ;block carry propagate 
(def bp1 flag) 

(def bg0 flag) ‘block carry generate 
(def bg! flag) 

(def bc3 flag) ;block carry 

(def bc7 flag) 

(def carrvi flag) "Cin-stage one 

(def carry2 flag) , -stage two 

(def carry3 flag) ; -stage three 

(def carry4 flag) ; -stage four 

(def cO flag) bit Ü carry 

(def c1 flag) ከ11 1 carry 

(def c2 flag) bit 2 carry 

(def c3 flag) Һи 3 саггу 

(def c4 flag) :bit 4 carry 

(def c5 flag) ;bit 5 carry 

(def c6 flag) ان‎ 6 carry 

(def c7 flag) bite 7 carry 

(def carryout flag) ;cout flag 

(def addO flag) ;bit sum flags 


(def add1 flag) 
MacPitts .müc Program for a S-Dit o-Stage 
Pipeline Adder Circuit 
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42 
43 
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4፣ 

Figure 3.8. 


48 (def add2 flag) 


49 (def add3 flag) 
50 (def add4 flag) 
51 (def add5 flag) 
52 (def add6 flag) 
53 (def add? flag) 
54 (always 

نوہ 

56 :Stage One 

b 


58: (par (setq pl (word-xor ain bin)) 


59 (setq g1 (word-and ain bin)) 

60 (cond (cin (setq carry t)) 

61 (t (setq carryl f)))) 

62. 

63 ;Stage Two 

64: 

65 (par (setq bp0 (and (bit 3 p1) (bit 2 p1) (bit 1 p1) (bit 0 p1))) 
66 (setq bp1 (and (bit 7 p1) (bit 6 pi) (bit 5 pl) (bit 4 p1))) 
67 (setq bgü (or (bit 3 gi) (and (bit 2 g1) (bit 3 p1)) 
68 (and (bit 1 gi) (bit 2 p1) (bit 3 p1)) 

69 (and (bit 0 gi) (bit 1 pl) (bit 2 p1) (bit 3 p1)))) 
70 (setq bgl (or (bit 7 g1) (and (bit 6 g1) (bit 7 p1)) 
71 (and (bit 5 g1) (bit 6 pl) (bit 7 p1)) 

72 (and (bit 4 gl) (bit 5 p1) (bit 6 p1) (bit 7 p1)))) 
73 (setq p2 pl) 

74 (setq g2 g1) 

75 (setq carry2 carryl)) 

5 

77 "Stage Three 

18: 

79 (par (setq bc3 (or bgO (and carry2 bp0))) 

80 (setq be7 (or bgl (and bgü bp1) (and carry2 bpO bp1))) 
81 (setq p3 p2) 

82 (setq g3 g2) 

83 (setq carrv3 carrv2)) 

84: 

85 ,Stage Four 

207 

87 (par (setq cO (or (bit 0 g3) (and carry3 (bit 0 p3)))) 

88 (setq c1 (or (bit 1 g3) (and (bit O g3) (bit 1 p3)) 
89 (and carry3 (bit O p3) (bit 1 p3)))) 

90 (setq c2 (or (bit 2 g3) (and (bit 1 g3) (bit 2 p3)) 

91 (and carry3 (bit O p3) (bit 1 p3) (bit 2 p3)))) 

92 (setq c3 bc3) 

93 (setq c4 (or (bit 4 g3) (and bc3 (bit 4 g3)))) 


Figure 3.8. M@cPitts .mac Procram IST 1 GC DIL 77 
Pipeline Adder Circüli 0) 
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94 (setq c5 (or (bit 5 g3) (and (bit 4 g3) (bit 5 p3)) 


95 (and bc3 (bit 4 p3) (bit 5 p3)))) 

96 (setq c6 (or (bit 6 g3) (and (bit 5 g3) (bir 6 p3)) 
97 (and (bit 4 g3) (bit 5 p3) (bit 6 p3)) 
98 (and bc3 (bit 4 p2) (bit 5 p3) (bit 6 p3)))) 
99 (setq c7 bc?) 

100 (setq p4 p3) 

101 (setq carry4 carry3)) 

102 ; 

103 ;Stage Five 

104 ; 

105 (par (setq addO (xor (bit O p4) carry 4)) 
106 (setq addl (xor (bit 1] p4) c0)) 

107 (setq add2 (xor (bit 2 p4) c1)) 

108 (setq add3 (xor (bit 3 p4) c2)) 

109 (setq add4 (xor (bit 4 p4) c3)) 

110 (setq add5 (xor (bit 5 p4) c4)) 

111 (setq add6 (xor (bit 6 p4) c5)) 

112 (setq add? (xor (bit 7 p4) c6)) 

113 (setq carryout c7) 

114 (setq sumü addü) 

115 (setq sum1 add1) 

116 (setq sum2 add2) 

117 (setq sum3 add3) 

118 (setq sum4 add4) 

119 (setq sum5 add5) 

120 (setq sum6 addö) 

121 (setq sum7 add7) 

122 (setq cout carryout)))) 


Figure 378. MacPitts .mae Program for R S-Bit 5S5tage 
piperi mo or Circuit (cont.) 
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RIN 


I (program addp 16 

2. :This adder uses block carry lookahead (BCLA) addition 
3 (def 1 ground) 

(def ain port input (2344567891011 12 13 14 15 16 17)) 
(debi port input | these Ie 2h 25 26 27 28: 29 30 31 32:33)) 
(def cin signal input 24) 

(def sumlà signal output 35) 
(def sum14 signal output 26) 
9 (def suml12 signal output 37) 
10 (def sum12 signal output 38) 
11 (def sum11 signal output 29) 
12 (def surntO signal output 40) 
13 (def sum9 signal output 41) 
14 (def sum8 signal output 42) 
15 (def sum7 signal output 43) 
16 (def sum6 signal output 44) 
17 (def sum5 signal output 45) 
18 (def sum4 signal output 46) 
19 (def sum3 signal output 47) 
20 (def sum2 signal output 48) 
21 (def suml signal output 49) 
22 (def sumO signal output 50) 
23 (def cout signal output 51) 
24 (def 52 phia) 

25 (def 53 phib) 

26 (def 54 phic) 

27 (def 55 povver) 

28 (def pl register) 

29 (def p2 register) 

30 (def p3 register) 

31 (def p4 register) 

32 (def gl register) 

33 (def g2 register) 

34 (def g3 register) 

35 (def bpO flag) 

36 (def bpl flag) 

37 (def bp2 flag) 

38 (def bp3 flag) 

39 (def bgü flag) 

40 (def bg] fag) 


-1[ 5 ct da 


oc 


Elizune? 105“Y1acP1065 “mac "b roerum for a 16-Bit 
5-Stage Pipeline Adder Circuit 


(def bg2 flag) 
(def bg3 flag) 
(def be3 flag) 
(def bc? flag) 
(def bc11 flag) 
(def bc15 flag) 
(def carry! flag) 
(def carry2 flag) 
(def carry3 flag) 
(def carrv4 flag) 
(def cO flag) 
(def c1 flag) 
(def c2 flag) 
(def c3 flag) 
(def c4 flag) 
(def c5 flag) 
(def c6 flag) 
(def c7 flag) 
(def c8 flag) 
(def c9 flag) 
(def c10 flag) 
(def c11 flag) 
(def c12 flag) 
(def c13 flag) 
(def c14 flag) 
(def c15 flag) 
(def carryout flag) 
(def add0 flag) 
(def add1 flag) 
(def add2 flag) 
(def add3 flag) 
(def add4 flag) 
(def add5 flag) 
(def add6 flag) 
(def add7 flag) 
(def add8 flag) 
(def add9 flag) 
(def add10 flag) 
(def add11 flag) 
(def add12 flag) 


41 
42 
49 
44 
45 
46 
47 
48 
49 
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.mac Program ии 
5-Stage Pipeline Adder Circuit (Cont) 


58 


MacPitts 


Figure "10: 


81 (def add15 flag) 
82 (def add14 flag) 
83 (def add15 flag) 


۶۸۲۵ ۵ 


85: 

86 ;Stage One 

ҜЕТ : 

88 (par (setq pl (word-xor ain bin)) 

80) (setq g1 (word-and ain bin)) 

90 (cond (cin (setq carry 1 t)) 

91 (t (setq carrvi f)))) 

37 

93 Stage Two 

94 

95 (par (setq bpO (and (bit 3 pl) (bit 2 pl) (bit 1 pl) (bit O p1))) 
96 (setq bp] (and (bit 7 pl) (bit 6 pl) (bit 5 pl) (bit 4 pl))) 
97 (setq bp2 (and (bit 11 pl) (bit 10 pl) (bit 9 pl) (bit 8 p1))) 
98 (setq bp3 (and (bit 15 pl) (bit 14 pl) (bit 13 pl) 

99 (bit 12 p1))) 

100 (setq bgü (or (bit 3 g1) (and (bit 2 g1) (bit 3 p1)) 

101 (and (bit 1 gl) (bit 2 pl) (bit 3 p1)) 

102 (and (bit O gi) (bit 1 pl) (bit 2 pl) (bit 3 p1)))) 
103 (setq bg! (or (bit 7 gl) (and (bit 6 gi) (bit 7 p1)) 

104 (and (bit 5 gi) (bit 6 pil) (bit 7 pi)) 

105 (and (bit 4 gi) (bit 5 pl) (bit 6 pil) (bit 7 p1)))) 
106 (setq bg2 (or (bit 11 gi) (and (bit 10 gi) (bit 11 pi)) 

107 (and (bit 9 gi) (bit 10 pl) (bit 11 ((ام‎ 

108 (and (bit 8 g1) (bit 9 p1) (bit 10 p1) (bit 11 p1)))) 
109 (setq bg3 (or (bit 15 gi) (and (bit 14 gl) (bit 15 pl)) 

110 (and (bit 12 gl) (bit 14 pl) (bit 15 pl)) 

11] (and (bit 12 g1) (bit 13 p1) (bit 14 p1) (bit 15 ((((ام‎ 
112 (setą p2 pl) 

113 (setq g2 g1) 

114 (setq carry2 carry1)) 

113; 

116 ;Stage Three 

117 ; 

118 (раг (setq bc3 (or bgü (and carry2 bpü))) 

119 (setq be? (or bgl (and bgü bp1) (and carry2 bp bpl1))) 
120 (setą bcti (or bg2 (and bgl bp2) (and bgö bp bp2) 


MacPitts .mac Program for a 16-Bit 
5-Stage Pipeline Adder Circuit (conte) 


Figure 3.10. 


121 (and carrv2 bpO bpl bp2))) 


122 (setq be15 (or bg3 (and bg2 bp2) (and bg] bp2 bpd) 
123 (and bgO bpl bp2 bp) (and carry? bpü bpl bp2 bp3))) 
124 (setq p3 p2) 

125 (setq g3 g2) 

120 (*etq carry 2 carry2]) 

1 

128 Stage Four 

129 : 

130 (par (setq c0 (or (bit 0 g3) (and carry3 (bit 0 p3)))) 

151 (setq c1 (or (bit 1 g3) (and (bit 0 g3) (bit 1 p3)) 

22 (and carrv3 (bit O p3) (bit 1 p3)))) 

133 (setq c2 (or (bit 2 g3) (and (bit 1 g3) (bit 2 p3)) 

134 (and carry3 (bit 0 p3) (bit 1 p3) (bit 2 p3)))) 
135 (setq c3 bc3) 

136 (setq c4 (or (bit 4 g3) (and be3 (bit 4 g3)))) 

153 (setq c5 (or (bit 5 g3) (and (bit 4 g3) (bit 5 p3)) 

138 (and bc3 (bit 4 p3) (bit 5 p3)))) 

139 (setq c6 (or (bit 6 g3) (and (bit 5 g3) (bit 6 p3)) 

140 (and (bit 4 g3) (bit 5 p3) (bit 6 p3)) 

141 (and bc3 (bit 4 p3) (bit 5 p3) (bit 6 p3)))) 
142 (setq c7 bc?) 

143 (setq c8 (or (bit 8 g3) (and bc? (bit 8 p3)))) 

144 (setq c9 (or (bit 9 g3) (and (bit 8 g3) (bit 9 p3)) 

145 (and be? (bit 8 p3) (bit 9 p3)))) 

146 (setq c10 (or (bit 10 g3) (and (bit 9 g3) (bit 10 p3)) 
147 (and (bit 8 g3) (bit 9 p3) (bit 10 p3)) 

148 (and be7 (bit 8 p3) (bit 9 p3) (bit 10 p3)))) 
149 (setq c11 bc11) 

150 (setq c12 (or (bit 12 g3) (and bell (bit 12 p3)))) 

151 (setq c13 (or (bit 13 g3) (and (bit 12 g3) (bit 13 p3)) 
152 (and bell (bit 12 p3) (bit 13 p3)))) 

15: (setq c14 (or (bit 14 g3) (and (bit 13 g3) (bit 14 p3)) 
134 (and (bit 12 g3) (bit 13 p3) (bit 14 p3)) 

155 (and bell (bit 12 p3) (bit 13 p3) (bit 14 p3)))) 
156 (setq c15 bc15) 

157 (setq p4 p3) 

158 (setq carry4 carry3)) 


Figure 3.10. MacPitts .mac Program for a 16 prt 
5-Stage Pipeline Adder Ċircuit ۳) 


60 


139" 
160 Stage Five 


161: 

162 (par (setq addO (xor (bit O p4) carry4)) 
165 (setq addl (xər (bit 1 :p1) c0)) 
164 (setq add? (xor (bit 2 p4) c1)) 
165 (əetq add3 (xor (bit 3 pi) e2)) 
166 (setq add4 (xor (bit 4 p1) c2)) 
167 (setq adda (xor (bit 5 p4) c4)) 
168 (setq add6 (xor (bit 6 p4) c5)) 
169 (setq add? (xor (bit 7 p4) c6)) 
170 (setq adds (xor (bit 8 p4) c7)) 
171 (seta add9 (xor (bit 9 p4) c8)) 

2 (setq add10 (xor (bit 10 p4) c9)) 
|. (setq add11 (xor (bit 11 p4) c10)) 
174 (setq add12 (xor (bu 12 p4) c11)) 
175 (setq add13 (xor (bit 13 p4) c12)) 
176 (setq add14 (xor (bit 14 p4) c13)) 
17 (setq add15 (xor (bit 15 p4) c14)) 
178 (setq carryout c15) 

179 (setq sumO add0) 

180 (setq sum] addl) 

181 (setq sum2 add2) 

182 (setq sum3 add3) 

183 (setq sum4 add4) 

184 (setq sum5 add5) 

185 (setq sumó add6) 

186 (setq sum? add?) 

187 (setq sum8 а448) 

188 (setq sum9 add9) 

189 (setq sum10 add10) 

190 (setq suml1 add11) 

191 (setq sum12 add12) 

192 (setq sum13 add13) 

193 (setq sum 14 add14) 

194 (setq sum15 add15) 

195 (setq cout carryout)))) 


|... cs MacPaitts -mac Program for a 18-Bit 
5-Stage Pipeline Adder Circuit (cont. ) 


OT 


Initially, simulations on the 4-bit and 8-bit pipeline 
adder circuits could not be performed due to numerous wiring 
and alignment errors in the MacPitts designs. These errors 
are discussed in Chapter V of this thesis. After all of the 
wiring and alignment errors were corrected the two adders 


produced correct simulations using esim. 
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IV. DESIGN PERFORMANCE COMPARISONS 


A. TIMING ANALYSIS USING CRYSTAL 
EZ Introduction 

Crvstal is a VLSI circuit delav analvsis program 
developed at the Universitv of California at Berkelev. The 
slowest paths in the circuit are determined bv Crvstal and 
this information can be used to calculate the maximum clock 
Gpeciotasthescircuit:GVersion 2፡60 Crystal-found in the 
berk85 VLSI design tools available on the UNIX VAX computer 
svstem was used for all timing and delav analvsis. 

Crystal reads circuit description information from a 
-Sim file created by the circuit extractor program Mextra 
and then accepts commands from the programmer from the 
terminal keyboard. There are seven categories of Crystal 
commands and they must appear in the following order when a 
timing analysis is performed: model commands, circuit 
commands, dynamic node commands, check commands, setup commands, 
delay commands and miscellaneous commands. References 6 
and 11 should be consulted for a complete listing of all 
Crystal commands and their use. Output from Crystal is 
written on the terminal screen and can be stored in a file 
if the UNIX "script" command is executed before the timing 


analysis is started. 
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2. Combinational Circuits 
a. Performing a Delay Analysis 

Combinational circuits are the easiest circuits 
to analyze using Crystal. First, all input and output pads 
should be labeled using the VLSI circuit editor Caesar. The 
label can be any combination of distinctive ASCII characters 
except space, tab, newline, double quote, comma, semi-colon 
and parenthesis and must not start or end with a number. 
Next a .sim file is created using Mextra with a -o option. 
Only four commands: "inputs", "outputs", "delay" and 
"critical" are necessary to analyze the circuit. The 
commands "inputs" and "outputs" are used to identify the 
input and output signals of the combinational circuit. 
Delay commands are used to tell Crystal when input signals 


change value [Ref. 6]. The form of the delay command is: 


delay (signal name) tr tf 


where tr is the time that the signal will rise to l and tf 
is the time that the signal will fall to O. An example of a 


delay command is: 


delay ain 3 0 


This delay command specifies that the time that ain will 
rise is 3ns and the time that ain will fall is Ons. This 
means that ain is initially set to O and will rise to 1 ins 


later. If a negative time is used in a delay statement a 
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transition of that signal will not occur after time 0. This 
allows the programmer to have input signals stable at the 
start of the timing analysis. The command "critical" directs 
Crystal to calculate the slowest path in the circuit. 

Two other commands, "check" and "clear", may also 
be useful. The check command performs a static electrical 
check on the circuit. Information about nodes with no 
transistors connected to them, nodes that are not driven, 
nodes that don't drive anything, transistors that are 
permanently forced off, transistors connecting Vdd and GND, 
and transistors that are bidirectional is printed to the 
screen [Ref. 11]. All of this information, except for the 
information on the bidirectional transistors, is not very 
useful in a Macpitts generated circuit. This is because 
when the MacPitts silicon compiler does not use part of an 
organelle in a chip design the unused circuitry is left in 
the design resulting in improperly connected nodes and 
transistors. A bidirectional transistor is a transistor for 
which Crystal cannot determine the direction of signal flow 
within the transistor. To prevent Crystal from calculating 
circuit delays along impossible paths, bidirectional 
transistors must be labeled to show signal directions. (See 
paragraph 3.a. below for directions on how to label 
bidirectional transistors. ) 

The command "clear" is used to clear all previous 


delay information and critical calculations from Crystal. 
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Information on inputs and outputs is not affected. When a 
clear command is used new timing calculations can be made 
based on new delay commands for the same circuit. 

Figure 4.1 shows the sequence of commands used 
to perform a timing analvsis on a l-bit combinational adder 
circuit. A check for bidirectional transistors was 
previously performed and none were found in this circuit. 
Line 2 shows the command used to invoke Crystal and lines 6 
and 8 identify the circuit inputs and outputs. The Crystal 
output lines that are enclosed in brackets on lines 5 and 7 
indicate that Crystal has completed execution of the 
previous commands. Crystal outputs a line in brackets after 
the execution of every command. In lines 10, 16 and 19 the 
two input bits, ain and bin, and the input carry bit, cin, 
are set to 1, O and 1, respectively, with delay commands. In 
Jines 14, 17 and 20 Crystal indicates the number of stages 
that had to be examined to determine the timeing delay for 
each signal. After the delay commands, the critical command 
is given in line 22. Lines 23 through 55 shows the time 
delay through the critical path in the circuit. Each node 
that is in the critical path is identified with the time 
that it is driven. In this case the critical path started at 
input pad bin, goes through the combinational logic in the 
data-path and then ends at the output pad res 198.12ns later 


(see Figure 4.2). 
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% script 


1 

2 % crystal addcl.sirn 
3 Crystal, v 2 

4 : build addc1 sim 
5. 10:00.7u 0:00.2s 30K 
Ó : Inputs ain bin cin 

7 10:00 Uu 0:00.0s 39k: 

8 : outputs res 

9 |0:00 Ou 0:00.0s 39k 

10 : delay ain O -1 

11 Marking transistor flow... 
12 Setting Vdd to 1... 

13 Setting GND to O... 

14 (28 stages examined.) 

15 [0:00.2ህ 0:00.1s 48k] 

16 : delay bin-10 

17 (41 stages examined.) 

18 10:00.1u 0:00.1s 54k) 

19 : delay cin 0 -1 

20 (26 stages examined.) 

21 |0:00.0ህ 0:00.05 548] 


22 : critical 

23 Node res is driven high at 198.12ns 

24 ...through fet at (885, 525) to Vdd after 
25 342 15 driven high at 189.31ns 

26 .. through fet at (870, 457) to Vdd after 
27 357 is driven low at 179. 77ns 

28 ...through fet at (849, 505) to GND after 
29 139 is driven high at 171.36ns 

30 .. through fet at (730, 387) to Vdd after 
31 258 is driven low at 85.70ns 

32 ...through fet at (668, 381) to 233 

2 ..through fet at (668, 376) to GND after 
34 221 is driven high at 81.04ns 

35 ...through fet at (623, 385) to Vdd after 
36 240 s driven low at 72.31ns 

37 ...through fet at (561, 379) to 225 

38 ...through fet at (561, 374) to GND after 
39 171 is driven high at 66.64ns 

40 ...through fet at (454, 387) to Vdd after 
4] 255 is driven low at 48.79ns 

42 through fet at (392, 381) to 231 

43 through fet at (392, 376) to GND after 
44 219 is driven high at 44.13ns 

45 ..through fet at (347, 385) to Vdd after 


46 237 is driven low at 35.35ns 
Figure 4.1. Crystal Delay Analysis of a 1-Bit 
Combinational Adder 
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47 ...through fet at (285, 379) to 223 


48 ...through fet at (281, 374) to GND after 
49 141 is driven high at 30.53ns 

50 ...through fet at (264, 381) to Vdd after 
51 119 is driven low at 13.17ns 

52 ...through fet at (474, 603) to GND after 
53 401 15 driven high at 8.66ns 

54 ...through fet at (518. 593) to Vdd after 


55 bin is driven low at 0.0Ons 
56 ;0:00.1u 0:00.2s 54k 

57 : clear 

58 |0:00.0u 0:00.0s 51k! 

59 : delay ain -10 

60 Marking transistor flow... 
61 Setting Vdd to 1... 

62 Setting GND to O... 

63 (26 stages examined.) 

64 10:00.1u 0:00.1s 60k] 

65 : delay bin 0 -1 

66 (52 stages examined.) 

67 10:00.1u 0:00.1s 63k) 

68 : delay cin -1 0 

69 (61 stages examined.) 

70 [0:00.2u 0:00.0s 63 


71 : critical 

72 Node res is driven high at 226.63ns 

73 .. through fet at (885, 525) to Vdd after 
74 342 is driven high at 217.82ns 

75 .. through fet at (870, 457) to Vdd after 
76 357 is driven low at 208.28ns 

77 ...through fet at (849, 505) to GND after 
18 139 is driven high at 199.87ns 

79 .. through fet at (730, 387) to Vdd after 
80 258 is driven low at 114.21ns 

81 ...through fet at (668, 381) to 233 

82 ...through fet at (668, 376) to GND after 
83 221 is driven high at 109.55ns 

84 ...through fet at (623, 385) to Vdd after 
85 240 is driven low at 100.82ns 

86 through fet at (561, 379) to 225 

87 .. through fet at (561, 374) to GND after 
88 171 is driven high at 95.15ns 

89 ...through fet at (454, 387) to Vdd after 


90 255 is driven low at 77.30ns 


Figure 4.1. Crystal 1٦7613٣۷11  7٤٭‎ 
Combinational Adder (cont. ) 
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9] ...through fet at (392, 381) to 231 


92 .. through fet at (392, 376) to GND after 
93 219 is driven high at 72.64ns 

94 ...through fet at (347, 385) to Vdd after 
95 237 is driven low at 63.86ns 

96 .. through fet at (285, 379) to 223 

97 ..through fet at (281, 374) to GND after 
08 141 is driven high at 59.04ns 

99 „through fet at (264, 381) to Vdd after 
100 — 170 is driven low at 42.03ns 

101 ...through fet at (188. 372) to GND after 
102 63 is driven high at 28.83ns 

103 ...through fet at (182. 189) to Vdd after 
104 361s driven low at 15.52ns 

105 . through fet at (885. 264) to GND after 
106 148 is driven high at 8 65ns 

107 ... through fet at (875. 408) to Vdd after 
108 cin is driven low at 0.00ns 

109 : quit 


Figure a l. Crystal Delay Analysis of 8+64 
Combinatlonalimadder (Cont: ) 
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In line 57 the clear command is used so that a new 
timing delav analvsis on the same circuit can be made. In 
lines 59, 65 and 68 new delav commands set ain, bin and cin 
tomo, 1 and O respectively. The critical command is given on 
line 71 and new critical path information is shown on lines 
(2 through 108. This time the critical path starts at the 
input pad cin, goes through the Weinberger array and the 
combinational logic in the data-path and ends at the output 
pad res 226.63ns later. After finishing a Crystal timing 
analysis the command "quit" should be used to exit the Crystal 
program. 

As can be seen from the timing analysis of the 
l-bit combinational adder, the longest critical path occurs 
when cin is driven to a low state. This is because the cin 
Signal must travel through the Weinberger array and the first 
organelle in the data-path. This circuitry is normally at a 
high state unless brought low by a low cin. A high cin causes 
no level transitions so there is no delay through the 
circuitry. For a low cin there is a low transition that 
takes approximately 30ns to propagate through the Weinberger 
array and the first organelle in the data-path. 

If the -g (filename) option is used with the 
critical command [{Ref. 11] the critical path timing information 
is printed in (filename) in a format that can be accessed by 


Caesar using the Caesar '"source" command. Each node in the 
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critical path is identified on the Caesar display screen 
along with the timing delay information. Figure 4.3 shows 
an example of how the timing delay information is displayed. 
b. Validation of Crystal's Timing Data 

(1) Introduction. Previous to this research 
effort there had been no experience at the Naval Postgraduate 
School in using Crystal to analyze circuits. The accuracy of 
the results produced by Crystal was not known. In order to 
gain confidence in Crystal a complete timing analysis of the 
l-bit combinational adder previously analyzed by Crystal was 
performed using the Mead-Conway guidelines in [Ref. 12]. 

The critical path found by Crystal was used 
to determine which transistors in the circuit were on. The 
delay calculations are divided into logic delays, wire 
delays and pad delays. 

(2) Logic Delays. The following equations were 


used to calculate the logic delay in the circuit: 


let = 2t 
Tinv = fkt 
Tnand = 2fkt 


where Tpt is the delay for a pass transistor, Tinv is the 
delay for an inverter, Tnand is the delay for a nand gate, Ut 
is the signal transit time, f is the gate fanout, and k is 
the pull-up to pull-down transistor ratio. Table I shows 


that the total logic delay, in terms of the signal transit 
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TABLET 


RESULTS OF LOGIC DELAY CAFEULATJONS 


LOGIC | k | +f | GATE DELAY | 48 OF GATES | TOTAL 
ELEMENT | ۱ | | I DELAV 
| | ۱ ۱ | 
( ۱ ' ۱ 
inverter | 4 | 1 ۱ 41 ۱ 1 | 4t 
۱ | | ۱ | 
۱ ۱ ۱ | ۱ 
pass | l [ ۱ ۱ 
transistor! - 1 ፦ | 2t | 1 ۱ zt 
۱ | ۱ | ۱ 
| | | | 
nand gate | 8 | 3 | 48t ۱ 1 ۱ 481 
| | ! ۱ | 
| i | | ! 
nand gate | 4 | 2 | lót | ጋ | 481 
! | | | | 
۱ | | | ۱ 
nand gate | 4 | 1 ۱ at | 9 | 401 
| | ۱ ۱ ۱ 


TOTAL LOGIC DELAY = 142t 
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time, is 142t. Reference 12 states that the signal transit 
time equals 0.3ns for a six micron design (lambda equals 3 
microns) and the 1-1011 combinational adder is a 4 micron 
design (lambda equals 2 micron). The transit time is scaled 
down by dividing by the scale factor 1.5 (6 microns divided 
by 4 microns). This gives a transit time of 0.2ns. Using 
mimo value; a logic delay Of Zöns is obtained. This value 
is doubled to account for stray capacitance in the circuit 
giving a total logic delay of 56ns. 

(3) Wire Delays. From Figure 4.2 it can be seen 
that there are long metal and polysilicon runs in the circuit. 
The total length of metal runs from the input pad to the 
Weinberger array and from the data-path to the output pad is 
approximately 3.9mm. The total length of polysilicon runs 
from the input pad to the Weinberger array, from the Weinberger 
array to the data-path and from the data-path to the output 
pad is approximately 2.1mm. There are no significant 
diffusion runs in the circuit. 

Reference 12, page 231, states that metal line 
delays equal O.1ns/10mm and that polysilicon line delays 
equal 200.0ns/10mm. Using these values a wire delay of 42ns 
is calculated. The wire delays used in the above calculations 
are based on a 6 micron design. When lambda is scaled down 
the capacitance per unit length of wire stays constant but 
the resistance scales up quadratically. Since lambda is 


sealed down by a factor of 1.5 the wire resistance scales 


TO 


up by 2.25. Multiplying the above wire delay by 2.25 gives 
a total wire delay of 95ns. 

(4) Pad Delays. The signal delay for the output 
pad is approximately 13ns [Ref. 13]. Due to the lack of 
available information on the signal delay for the input pad 
the delay calculated by Crystal of 8ns will be used in this 
comparison. This gives a total pad delay of 21ns. 

(5) Comparison of Results. In Table II a 
comparison is given of the circuit delays calculated using 
the Mead-Conway methods and those calculated by Crystal. 

The logic delays calculated using the Mead-Conway methods are 
less than that calculated by Crystal because delays caused by 
the polysilicon wires connecting the gates together in the 
data-path are not taken into account in the Mead-Conway 
calculations. The total circuit delay of 172ns calculated 
by the Mead-Conway methods is in close agreement with the 
226.63ns delay calculated by Crystal. It can be concluded 
that the circuit delay information given by Crystal is 
accurate and can be used with confidence. 

3. Pipeline Circuits 

a. Labeling Bidirectional Transistors 

Before a timing analysis can be done on a MacPitts 
design the bidirectional transistors in the circuit must be 
identified by using the check command of Crystal and properly 
labeled so that Crystal does not have to determine the 


direction of signal flow through these transistors. The 
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TABLE II 


COMPARISON OF MEAD-CONWAY AND 
۰" ) 1756 


۱ Mead-Conwav | Crystal 
۱ ۱ 
۱ ۱ 
logic delay ۱ Јӧп5 ۱ 93.79ns 
۱ ۱ 
١ ١ 
wire delav ۱ 95ns ۱ 105.84ns 
۱ ۱ 
| | 
pad delay | 2105 ۱ z?ns 
۱ ( 
| i 
total delay ۱ 172ns ۱ 226.۸055 
۱ | 
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Macpitts data storage elements, the register and the flag, 
each have bidirectional transistors in them. The register 
has five bidirectional transistors in each register cell and 
the flag has one. 

The procedure used to show the direction of 
signal flow through a bidirectional transistor is to attach a 
transistor attribute label to the transistor using Caesar. A 


transistor attribute label has the following form: 


Cr:(label)$ 


The label must be placed exactly in the middle of the source 
or drain edge of the gate region of the transistor. This is 
done by placing the center of the Caesar bounding box over 
the center of the source or drain edge of the gate and 


typing the folloving Caesar command : 


la Cr:(label)$ center 


Figure 4.4 shovs a stipple plot of a bidirectional 
transistor. The center of the bounding box is on the center 
Of the source edge of the gate region and the transistor 
attribute label Cr:A$ has been affixed to this point. 

If a bidirectional transisotr is not electrically 
connected to any other bidirectional transistor the transistor 
attribute label should be placed on the source edge of the 
gate. If two or more bidirectional transistors are 


electrically connected the same attribute label should be 
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Placement of a Transistor Attribute 
mabel om a BidurcetionalTransistor 


Figure 4.4. 
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placed on each transistor on the side of the gate that 
shares the electrical connection to the other transistors. 
Bidirectional transistors that are not electrically 
connected should have different labels. 

Figure 4.5 shows the stipple plot of a register 
cell that has five bidirectional transistors labeled with 
transistor attributes. The bidirectional transistor labeled 
Cr:A$ is not electrically connected to any other bidirectional 
transistor. The source side of the gate has been labeled. 
The two transistors labeled Cr:B$ are the pull-up and pull- 
down transistors of an inverter. Due to the unusual MacPitts 
inverter structure Crystal could not determine the direction 
Of signal flow and identified the pull-up and pull-down 
transistors as bidirectional. Since both of the transistors 
are electrically connected the same transistor attribute 
label has been placed on the side of the gates that are 
connected. Transistors labeled Cr:C$ are the pull-up and 
pull-down transistors of another inverter. Figure 4.6 shows 
the transistor attribute labeling for the one bidirectional 
transistor in a flag. 

b. Crystal Commands for Clocked Circuits 

(1) Problems Analyzing a MacPitts Design. 
Crystal was designed to be used for a non-overlapping 
clocking scheme. The overlapping clock phases and the five 
segment period of the MacPitts clock (see Figure 2.2) make 
the MacPitts pipeline adder circuit much more difficult to 


analyze. 
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When Crystal does a timing analysis of a 
clocked circuit it is assumed that each clock phase (or 
clock period segment in the case of a MacPitts design) is 
lone enough for the combinational logic in the circuit to 
Settle. But in a MacPitts circuit the first and second 
clock period segments, tand t2, are used for the settling 
time of the combinational logic. Crystal will give an 
overly long delay for tl of a MacPitts design because all of 
the logic propagation delay will be assigned to this section. 

Another problem is that it will not be 
possible to determine the logic delay of any stage in the 
pipeline if the delay of the clock phase signals phia, phib 
and phic getting to the registers or flags is longer than 
the stage logic delays. This is because Crystal only gives 
the timing delay for the critical or longest path in the 
Circuit: 

The problems are solved by dividing the 
timing analvsis of the Macpitts pipeline design into two 
parts. First the clocked reġisters and flags of the chip 
are analvzed for the timing delav of the input clock phase 
signals and then the combinational logic in each pipeline 
stage is analyzed to determine the slowest stage in the 
pipeline system. 

(2) Register and Flag Delays. The first step 
in performing a timing analysis of the clocked registers and 


7575115. ያስ ስን 6 1215621115 circuit using Caesar. All 


5,3 


logic except the flags block, registers and the ground, 
power and clock pads is deleted from the circuit. This is 
done so that Crystal does not use the extraneous circuitry 
in determining the critical path through the registers and 
flags. Next, the registers are deleted from the circuit 
because the clock phase signals will take longer to reach 
the flags than the registers. This is because the registers 
are closer to the clock pads on the clock bus and also the 
clock phase signals are further delayed in the flag block 
by two inverters. Finally, the input and output lines of 
each flag are disconnected from the extraneous data lines 
going to the Weinberger array, if not already done so, and 
the input and output wires of each flag are labeled (see 
Figure 4.7). Figure 4.8 shows what the edited circuit 
looks like for 4-bit S-stage pipeline adder. 

The timing analysis of a clocked circuit 
is Similar to that of a combinational circuit except that 
there is a separate set of delay and critical commands for 
each clock phase. For the MacPitts overlapping clock there 
is a separate set of delay and critical commands for each of 
the five segments of the clock period. The clear command is 
used between each set of delay and critical commands. Prior 
to the delay commands, the clock phases that do not change 
state during a section of the clock period should be set to 
the high or low state using the set command [Ref. 11]. Inputs 


that are set to a state are not used by Crystal to determine 
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the critical path because they do not have a state 
transition. Also, if delay commands for inputs other than 
the clock phases are not used Crystal assumes that the 
input signals stabilize long before the start of the clock 
period. Crystal then determines the longest critical 

path in the circuit no matter what the state of the 
non-delayed inputs are. In Figure 4.9 the Crystal commands 
used to analyze the clock phase delays through the flags 
block are listed. 

(3) Pipeline Stage Delays. A separate Crystal 
timing analysis must be performed on the combinational logic 
in each pipeline stage in order to obtain propagation delays 
for each stage. First, the input and output signals of each 
stage must be determined. Input signals come from input pads 
or from register or flag outputs. Output signals are inputs 
to registers, flags or output pads. Next, using Caesar, the 
input and output lines of each stage are disconnected from 
any logic elements that are not part of that stage. This is 
done so that Crystal does not use circuitry that is not part 
of a stage in determining the critical path through that 
stage. Labels are then placed on all input and output lines. 

Figures 4.10 and 4.11 show two different 
circuits before they are edited using the above procedure 
and Figures 4.12 and 4.13 show the circuits after they have 
been edited. In Figure 4.12 node cl is the output line of 


stage 1 of the pipeline and has been disconnected from the 
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^ script 

^ Crystal addp4.sim 
: inputs in416:1» phia phib phic 
፡ outputs out<16:1> 
፡ set 1 phia phic 

: delay phib O -1 

: critical 

$ clear 

set 1 phia 

delay phib -1 О 
delay phic ፦1 0 
critical 

clear 

set 0 phib phic 
delay phia –1 О 
critical 

clear 

set 0 phib phic 

: delay phia Ü -1 

: critical 

clear 

set 1 phia 

set 0 phib 

delay phic O -1 
critical 

quit 


Figure 4.9. Crystal Commands: Timing Delay of 
Clock Phases 
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Flag Cells After Editing 


Figure 4.15. 
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input of the storage register cell. Node dl is the input line 
of stage 2 and has been disconnected from the output of the 
register cell. In Figure 4.13 nodes ol and pl are output lines 
of stage 4 and have been disconnected from the input lines of 
the storage flags. Nodes o2 and p2 are inputs of stage 5 and 
have been disconnected from the output lines of the flags. 
After all stages have been isolated and input 
and output lines labeled a .cif file is created using Caesar 
and then a .sim file is created using Mextra. A Crystal timing 
analysis is then performed on each stage in the pipeline 
using the same procedure as used when performing an analysis 


Ona combinational logic circuit. 


B. DESIGN COMPARISONS 

Three important parameters used when comparing the 
performance of integrated circuit designs are chip size, 
power and speed. 

In order to determine the speed of a MacPitts pipeline 
design the logic delay in each stage and the clock phase 
delays must be compared. The propagation time of the slowest 
stage in the pipeline is compared to the sum of the first 
two segments of the clock period tl and t2. This is because 
all logic propagation in the circuit must be settled before 
t3 when the inputs to all storage registers and flags are 
sampled. The slowest of these times is then added to t3, t4 


and t5 to determine the clock period. 
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Table III shows the propagation delay for each stage of 
a d-bit pipeline adder and an 8-bit pipeline adder (4 micron 
designs). The long delavs in stages 2 through 5 of each 
adder are caused bv long delavs through the Weinberger arrav 
and the long high-resistance polvsilicon runs carrving data 
from the registers to the arrav and carrving data back and 
forth from the flags block to the array. The delavs through 
the Weinberger arrav are due to three factors. First, the 
inputs to the arrav from the registers and flags are driven 
by k=4 inverters. These inverters, which are not super 
buffered, drive up to five nor gates in the array thus 
adding substantial delay to the stage [Ref. 12]. This delay 
could be considerably reduced if the outputs of all 
registers and flags were super buffered. Second, the 
propagation delay in the array is high due to the large 
number of nested NOR gates in the array. In some cases up to 
five NOR gates are nested to perform a particular function 
(i.e. an XOR function). This is much more delay than would 
be found in the two level nesting of a PLA. The excessive 
delays in the array are also caused by the long polysilicon 
lines that connect the inputs and outputs of the NOR gates. 
In some cases an output of a NOR gate is connected to the 
input of another NOR gate by a polysilicon wire that runs 
nearly the total width of the array. The increase in stage 
propagation delay of the 8-bit adder when compared to the 


4-bit adder is due to the increased size of the Weinberger 
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8-81 ۲ PIPELINE ADDER 


91.87ns 


25707:7365 


222.89ns 


250 . 5305 


20 5 


TAS 111 


PIPELINE STAGE DELAY 


4-BIT PIPELINE ADDER 


33.5?ns 


1 26 ۰14٩0 


10 5 


142.70ns 


141.63п5 


STAGE 


array of the 8-bit adder and not due to a poorly designed 
pipeline chip. 

In Table IV the delav in each clock period segment is 
listed. The long delavs are due to the input clock pads not 
being super buffered. One k-4 inverter on each clock pad 
must drive eight k=4 inverters; one inverter for each of the 
seven registers and one input inverter to the flags block. 
Each of the input inverters of the registers and flag block 
cause further delay because they are not super buffered but 
must drive many register cells and flags. In the case of the 
8-bit pipeline adder one k=4 inverter must drive twenty- 
seven flags. Additional delay is caused by the long clock 
bus. The clock signals must traverse a length nearly equal 
to the height and width of the chip before reaching the 
flags block. If the clock input pads, the input inverters, 
all registers and the flags block were super buffered the 
timing delay of each clock period segment would be 
substantially improved. 

Comparing Tables III and IV it can be seen that the 
propagation delays through clock period segments tl and t2 
are greater than the slowest stage for both the 4-bit and 
8-bit pipeline adders. Thus, the clock period is found ከሃ 
adding tl through t5. The clock period of the 4-bit 5-stage 
pipeline adder is 486.74ns (2.055 MHz clock) and the clock 
period of the 8-bit 5-stage pipeline adder is 706.32ns 


(1.415 MHz clock). 
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TRB IV 


00 UT LL DEL YT 


CLOCK PERIOD 4-BIT PIPELINE 8-BIT PIPELINE 


۱ ۱ 
SEGMENT ۱ ADDER ۱ ADDER 

۱ ۱ 
۱ ۱ 

ti ۱ 1 1 6 . 0۲ 5 ۱ 170 . 46۳ 5 
۱ ۱ 
۱ ۱ 

t2 i 66.62ns ۱ 1 02 ۰6۳ 
۱ ۱ 
۱ ۱ 

۳3 ۱ 82.93ns ۱ 1 0 6 . 95۳5 
۱ | 
| | 

ta ۱ 100.87ns ۱ 153.56ns 
۱ ۱ 
۱ ۱ 

t5 ۱ 1 20 . 0 505 ١ 172.68ns 
۱ ۱ 


OF 


Table V lists the chip size, power and speed of several 
4 micron combinational and pipeline adder circuits designed 
by MacPitts. In addition, a 16-bit 4-stage pipeline adder 
that was designed by hand is also listed [Ref. 5]. (See 
Figure 4.14.) 

Chip size and vorst case static pover consumption are 
standard outputs from the MacPitts silicon compiler. The 
required pover for the hand designed adder vas found by 
using a program called povest that makes an estimate of the 
DC pover required in a circuit based on the number of 
enhancement and depletion mode transistors in the circuit. 
Povest uses a .sim file as input and an output of the 
average DC pover (based on one-half of the transistors being 
on at any time) and the maximum DC pover (based on all 
transistors being on) is printed on the terminal screen. The 
value of pover listed in Table V for the hand designed adder 


is the maximum DC pover. The command to run povest is: 


povest -p € filename.sim 


For comparison, povest vas run on all of the MacPitts 
designs and the pover estimates calculated by povest and 
MacPitts vere, on the average vithin 102 of each other. 

All chip speed values listed in Table V vere calculated 
by Crystal. Reference 5 estimates the clock speed of the 
16-bit 4-stage pipeline adder as 8 MHz. This is seven times 


faster than the 1.141 MHZ calculated by Crystal. The reason 
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Figure 4.14. Hand Designed 16-Bit 4-Stage Pipeline 
Adder 
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for the discrepancv is that reference 5 does not take in 
account that the clock pads in the circuit, which are not 
super buffered, must drive a large number of pass transistors. 
Clock pad phia drives 138 pass transistors that clock data 
mea the five PLAS in the Circuit while clock pad phib drives 
۰۰۱۶۰۹5 transistors that clock data out of the PLAs. 

Another interesting observation about the hand designed 
circuit from reference 5 is that when the circuit is examined 
using Caesar a misalignment of one-half lambda between the 
data, power and ground buses going into the PLAs and the PLA 
blocks is found. As seen in Figure 4.15, the bus 
misalignments are not enough to disconnect any wires. 

As expected, when the combinational adder circuits were 
converted to pipeline circuits the chip size and power 
increased, but the increase in chip throughput (or speed) 
anticipated in a pipeline design did not occur. The slow 
circuitry of the Weinberger array, non-super buffered clock 
pads and long polysilicon runs in the MacPitts pipeline 
circuits caused excessive delays and decreased performance 
below that of the combinational circuits. The excessive 
delays could be reduced if the Weinberger array was redesigned 
to reduce the NOR gate nesting or replaced by a PLA, if all 
input lines to the array were super buffered and if the long 
polysilison runs were replaced with metal or diffusion runs. 
If the design of a 16-bit pipeline adder were possible it is 


expected that this design would have a clock speed less than 
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Misalignment of Chip Buses and PLA for 16-Bit 4-Stage 


Pipeline Adder Circuit 


Figure 4.15. 
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and chip area much larger than the hand designed adder. Even 
with fast logic in each pipeline stage and super buffered 
clocks the fact that the last three segments of the MacPitts 
clock period cannot be used for logic propagation insures 
that the MacPitts pipeline designs will be slower than any 


well designed hand-crafted circuit. 
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V. MACPITTS DESIGN ERRORS 


A. INTRODUCTION 

Although the MacPitts silicon compiler is expected to 
generate error free designs, several cases have been found 
where design errors have been made. These design errors 
fall into two categories: wiring errors and alignment 
errors. Wiring errors have occurred when wires become 
electrically connected when they should not be and alignment 
errors have occurred when circuitry has been placed 
incorrectly on the chip so that it does not align properly 


۳۷٢1:1 34.1596210" ey” 


B. WIRING ERRORS 
1. -Description of Erron: 

A case of a fatal wiring error was discovered where 
the MacPitts compiler electrically connected all three clock 
lines that run in the clock bus below the data-path to a 
data line that was running from the data-path to the 
Weinberger array. This error was found to occur whenever the 
last organelle of the data-path or sequencer is the 
organelle used by the compiler to transfer data from the 
data-path to the Weinberger array (see Figure 5.1). The 
vertical polysilicon data wire of this organelle runs 


parallel and only four lambda away from a large ground bus 
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line that is alwavs placed on the right edge of the 
data-path and sequencer. The horizontal clock bus must make 
metal-to-polysilicon polysilicon-to-metal vias over this 
ground bus. Since the data wire runs so close to the ground 
bus it crosses the clock bus at the metal-to-polysilicon via 
and becomes electrically connected to the clock lines (see 
Figure 5.2). This error was also found by Kelly (as 
mentioned in [Ref. 4]) when he used MacPitts to produce a 
butterfly switching element chip at MIT Lincoln Laboratory. 
Unfortunately, this error cannot be identified when a design 
rule check is made on the circuit because no design rules 
are violated. 

It is not difficult to predict when this wiring error 
is going to occur in the data-path and to correct it when it 
is found. A programmer should first examine the MacPitts 
.mac program to identify all statements that cause word size 
operations to be performed and cause the compiler to produce 
an organelle in the data-path. If the last word size 
statement in the .mac program uses the "bit" data-path 


function of the form: 
(bit <Kbit-position)> <integer-expression 2) 


the organelle that transfers data from the data-path to the 
Weinberger array will be placed on the right edge of the 
data-path and a fatal wiring error will occur. (See [Ref. 2] 


for a description of the bit function.) 
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Close-up of Wiring Error 


Figure 9.2. 
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It is more difficult to predict when dün drm 
error is going to occur in the sequencer than in the data- 
path. Reference 8 contains details of sequencer wiring 
errors. 

2. Correction Of Wiring Errors 

The wiring errors in the data-path and sequencer can 
be easily corrected using the Caesar VLSI circuit editor. 
The Caesar file that contains the last organelle of the 
data-path or the sequencer must first be identified. This 
file is then edited using Caesar and the right one or two 
data lines are rerouted around the clock bus via as shown in 
Figure 2777 

If it has been determined that the "bit" function is 
the last vork size statement in the .mac program the steps 
used in the MacPitts design cvcle of a 5 micron design that 
are listed on page 68 of reference 4 should be modified as 
follows: 

1. Generate a 5 micron .cif file as stated. The following 
command will create several Caesar files each containing 
a description of part of the design. (Ignore user 
extension warning). 

۸ cif2ca -1 250 filename.cif 

2. Rename the top level Caesar file. 

4 mv project.ca filename.ca 


3. Use Caesar to identify the Caesar file, symbol xx.ca, 
that has the wiring in it. 
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Correction of Niring- -Error 
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^ caesar filename 


The Caesar file for the complete data-path/sequencer 
may have to be edited in Caesar to identify the file 
that contains the last organelle fo the data-path/ 
sequencer where the wiring error is located. Caesar 
can be used to reroute the data lines around the clock 
bus via. 


Edit the top level Caesar file again and create a new 
Cit TIS 


sa 
cif 248 
q 
Next, perform a design rule check of the new .cif file. 
(Note that the cif command line ends in -qnq not -gng 
in the following command). 
% cif filename.cif -qnq 
%4 cll filename.co 
م‎ Arc filename.sco 
To perform an event simulation on the modified 5 micron 
design the procedure listed on page 71 of reference 4 
for the 4 micron design should be followed to affix 
labels to the bonding pads, obtain a node extract, 
and start the simulation run.  Insure that the 248 
scale is used when creating a new .cif file of a 5 


micron design in Caesar (see page 96 of reference 4). 


For a micron design that contains wiring errors 


the MacPitts design cycle listed on page 70 of reference 4 


should be followed. The wiring errors can be corrected, 


using the above procedure, at the same time that the labels 


are affixed to the bonding pads. 
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C. ALIGNMENT ERRORS 

Alignment errors have been found in the flags block and 
the Weinberger array of several different designs. Most of 
the alignment errors that were found were identified by 
performing a design rule check on the circuit that contained 
the errors. The design rule check program is able to find 
the errors because in most cases metal-to-metal, 
polysilicon-to-polysilicon or diffusion-to diffusion 
separation errors occur. 

In the flags block the errors have occurred when the 
compiler places the flags block on the chip so that the 
internal clock, ground and data buses of the block do not 
properly align with the chip clock, ground and data buses. 
The misalignment of the flags block has been found in three 
designs; the 4 micron 4-bit 5-stage pipeline adder and both 
the 4 micron and 5 micron 8-bit S-stage pipeline adders. In 
each case the circuitry inside the flags block has been 
designed correctly but the block itself has been placed 
incorrectly on the chip. 

In the case of the 4 micron 4-bit S-stage pipeline 
adder the flags block was placed two lambda too high in the 
circuit. Figure 5.4 shows that the flags block ground bus 
does not properly connect with the chip ground bus. In 
Figure 5.5 the metal-polysilicon contacts of the flags block 


clock lines do not properly align with the metal lines of 
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the chip clock bus. The flags block of the 1670172 Tik 
o-Stage pipeline adder is placed two lambda too high and 
one-half lambda too far left on the chip. Figure 5.6 shows 
the two lambda misalignment of the flags block clock lines 
and a one-half lambda misalignment of the flags block data 
lines. The flags block data lines have a two lambda overlap 
with the chip data lines so even with a two lambda vertical 
flags block misalignment the data lines are still 
electrically connected. A flags block miSalignment of eight 
lambda in the vertical direction was found in the 4 micron 
ö-bit S-stage pipeline adder. Figure 5.7 shows the clock 
and data bus alignment errors for this circuit. 

The Weinberger array alignment errors are more complex 
than the flags block errors. In addition to errors where 
the Weinberger array is placed incorrectly on the chip there 
are also some internal alignment errors in the array. Figure 
0.8 Shows three misalignments of the Weinberger array buses 
and the chip buses. Also shown is one internal misalignment 
where a diffusion line is not properly connected to a pull-up 
transistor. Weinberger array alignment errors will be 
treated in detail in reference 8. 

The cause of alignment errors is not yet understood. 
Alignment errors have only been found in MacPitts designs 
since the Macpitts compiler was installed under the UNIX 4.2 


operating system. No alignment errors were found when the 
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Figure 5.4. 
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8-Bit Stage Pipeline Adder 


ጋ Micron 


Clock and Data Bus Misalignment Errors 


Figure 9.6. 
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8-Bit 5-Stage Pipeline Adder 
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Clock and Data Bus Misalignment Errors: 


ء57 ۶7617۶6 


02722222222202222222022222 


AOO ULG ራራ A 


WL MM 
LLM MLL LLL 





= Ы Y 


rh 


` 
› ኬኣ 
гә, 
4 
” 
хәҸ 
2 


ZZ 8 6 72 








ሃሪ LLL m. 224, — 


7 


nment Errors 


ig 


er Array Al 


ES 


inbe 


We 


8 


ure 5. 


Fr 





SSSSSS 


ቀ ረ 


رو 
ፖ d‏ 
G‏ 











ሠ 








ZZ 





٦ 
۰ .. “ 
4: پیر‎ ` Xo 
` N 
“< D ኣባ 
ҹә “ `N NC 
€ 1458 ` “NN 
^ aad NNN ኣ 
4 .. کہ‎ 
аг ааа, SA. 
ир 3... WKY 
<. kə AN 
e" N 


و 


77 






ባባ 80! *ጫ ከጣ ካህ ከጣ ከ4 ከ4 ከጣ ነካ یر جا ود مد دہ ود بن‎ ካኳ ود ود مد وا ود‎ ኮካ ከ4 
ب(‎ ħa ኾነ ኤላ ኮዛ N ›= ካብ a Y و مد يد‎ ha la وھ‎ 4 16 ቅጣ 59 ود‎ Ma Ta a 
ee thy a Wa ha la Ҹә fiq, ود وذ‎ ha يد وذ وذ وت مت وط‎ by by by by Uo ty Py Thy 
دہ‎ a Ka fa ہد ود ود بد وط یہ مد بد مد ود‎ ከ4 My By Oy ኮጤ ካ ከኣ ከላ ከላ ኸጣ 
x^ h ha پد ب‎ ha وت ود‎ ba ہو‎ ha LLLI 












ና UL ና 
“84... عبط طل‎ k ii a Te la Ya lə: 
۳ Se 
I кк 
A ملم‎ he heed eth eed 
سم‎ 85. ጢ. “ላ ኮዛ a la e e fiq, 5. “ዝ 98 ካካ ኸጫዔ ካጤ ካ جھ ود‎ a ha چھ بطا ود ود ود‎ 
ክኑ NN 5 55 ካ ካህ 58 ከጤ 85 ካ5 ካይ ካጣ የጫ A a Oy ty دہ‎ ካክ ከጨ ኩላ ከዱ 
аһ ` 55 0 rə ور‎ 
ارس‎ ቁላ ቅላ ለ ኮፍ ħa ta Ta ħa Тут Тулу a. 


Na لیڈ‎ a ha ha Na kata ta ħala tad 





compiler was installed under the UNIX 4.1 operating system. 
It is thought that the version of the Franz LISP compiler 
installed under UNIX 4.2 may be causing an unexpected 
roundoff or truncation when the compiler calculates the 
vertical and horizontal coordinates used to place circüitry 
on the chip. Alignment errors can be corrected by using 


Caesar. 


118 


VI. CONCLUSION 


A. SUMMARY 

The objectives of this thesis were to determine what 
the basic circuits in MacPitts designs are and how they are 
used, to make performance comparisons of several different 
adder designs with a hand-crafted adder design and to obtain 
a better understanding of the MacPitts interpreter. 

The basic building blocks that the MacPitts compiler 
uses in circuits were found to be the data-path, the 
sequencer, the flags block and the Weinberger array. The 
circuit density and speed of the building blocks were found 
to be low. This was expected since Siskind was quoted in 
reference 14 as stating that optimizing chip performance was 
not a primary MacPitts design goal. The functional 
description of the circuit in the .mac program was found to 
have a direct relationship to the circuit structures that 
the compiler used to design the circuit. 

It was found that circuits designed by the MacPitts 
silicon compiler are very inefficient in terms of the amount 
of circuitry per chip area and that the speed of a MacPitts 
circuit is slow compared to hand-crafted designs. The 
significant advantage that MacPitts-designed circuits have 
over hand-crafted circuits is the reduction in time required 


to design the circuit. This makes silicon compilers an 


ијә 


attractive alternative when the time required to design a 
clrcuit is a more important consideration than the speed or 
size of the circuit. Until the cause of the alignment errors 
discussed in Chapter V is found and corrected, all Machitts 
designs must be inspected carefully for the possibility of 
alignment errors. Unexpectedly, it was also found that 
combinational adder circuits were faster than pipeline adder 
clrcuits because of the MacPitts clocking scheme and the 
timing delay caused by the non-super buffered clock lines 
driving the registers and flags. 

Appendix À gives a complete list of all the MacPitts 
interpreter commands and an explanation of their use. In 
addition, all interpreter error statements and their 
definitions are listed. 

In 1983 the developers of the MacPitts silicon compiler 
(Siskind, Southard, and Crouch [Ref. 3]) left MIT Lincoln 
Laboratory and formed their own company, MataLogic, Inc., to 
produce a commercial silicon compiler. MetaLogic's current 
compiler, called MataSyn, is a redesigned version of the 
MacPitts compiler. Most of the design limitations of 
MacPitts have been eliminated in MetaSyn. Two of the more 
significant improvements in MetaSyn are the redesign of the 
interpreter and the Weinberger array. The new interpreter, 
now called the simulator, is very flexible and user friendly 
and has few of the limitations of the MacPitts interpreter 


listed in Appendix A. The Weinberger array has been 
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redesigned to improve circuit speed. A PLA is now used for 
all chip control funetions and the Weinberger array is used 
only for bit sized boolean functions. The recommended 
MacPitts improvements listed below, except for #2, #5 and 


#6, have been incorporated in MetaSyn. 


B. RECOMMENDATIONS 
The following recommendations should be considered to 
improve the MacPitts Silicon Compiler: 
1. Add super buffers to all input pads. 


2. Add super buffers to all data lines leaving the data- 
path, sequencer and flags block, and to all clock 
lines driving the registers and flags. 


3. Redesign the design from to allow pads on all sides. 


4. Use channel routing instead of river routing to reduce 
the complexity of the Weinberger array. 


5. Implement a faster algorithm for design of the 
Weinberger array. 


6. Redesign the registers and flags so that a more 
conventional two-phase clock can be used in MacPitts 
designs. This will eliminate the circuit delay of the 
last three segments of the MacPitts clock that can not 
be used for logic propagation. 


7. Redesign the interpreter to make it more user friendly 
and able to handle large designs containing many flags, 
ports, signals, registers and processes as discussed 
in Appendix A. 


8. As mentioned in Chapter III, a data-path organelle 
should be designed to set and shift data bits of a 
d3 WOrd SO that data can be transferred from the 
Weinberger array to the data-path. 
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Redesign the data-path so that data can enter or leave 
the data-path from either the left or right Side to 
reduce the length of wire runs from the pads. 


Redesign the flags block and the data-path organelles 
to save wasted space illustrated in Chapter 311. 
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10. 


APPENDIX A 


THE MACPITTS INTERPRETER 


A. USE OF THE INTERPRETER 

The MacPitts interpreter is used to test for syntax and 
logical errors in the .mac file. The interpreter creates a 
functional environment of the integrated circuit from the 
emac file without actually designing the circuit. This 
functional environment can then be simulated. 

The interpreter can be invoked by using the following 


command: 


^ macpitts filename int herald 


Filename is the filename of the .mac file without the .mac 
extension. Herald is used so that as the MacPitts silicon 
compiler reaches a milestone as it 1S processing the .mac 
file, messages are printed to the terminal. Although the 
herald statement can be omitted the milestone messages 
assure the programmer that the silicon compiler is still 
processing the mac file on long compile runs. 

When the interpreter is readv to start processing a 
simulation run all registers, ports, processes, flags and 
Signals defined in the .mac file are listed in a table 
on the terminal screen along with their values (see 
Figure A.1). The first thirty-six items displayed in the 


table are labeled from O-9 and a-z. The MacPitts 


12: 


REGISTERS FLAGS 


1: a0 = undefined 9: q1 = undefined 
2: al = undefined 65 ri = undefined 
3: sto = undefined 7: carry = undefined 
4: w2 = undefined 

SIGNALS 
PORTS s reset = undefined 
55 ain = undefined c: cin = undefined 
9: bin = tri-state d: cout = undefined 
a: res = undefined 
PROCESSES 
e: countup = (undefined) 
f: countdown = (undefined) 
Ready Value=0 


Figure A.1. The Interpreter Screen Display 
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interpreter does not have the abilitv to label more than 
thirtv-six items so items thirtv-seven and higher are not 
labeled. At the bottom of the screen a command line is 
displayed. The command line shows the status of the 

۳۱۱۱۳ ۰۲ 9۵ at any timcosubossible command line displays are 
Ready, indicating that the interpreter is ready to accept a 
command and 0 indicating that the interpreter 15 
Pemformingema ۲ 1۱۳۱ ۰ 10۵۱۸۸1 ۵11 9 iron "of the chip through one 
or more clock cycles. On the bottom right of the screen the 
contents of a special interpreter register called "value" 
are shown. The value register is used to set ports and 
registers to particular values and also indicates the number 
of clock cycles a simulation run will execute. 

There is one serious limitation with the interpreter 
that causes it to be unusable for many large chip designs. 
If the total number of registers, ports and processes 
defined in the .mac file is greater than twenty there will 
be too many items for the interpreter to display on the 
right side of the terminal screen at once (see Figure A.1). 
Also, if the total number of flags and signals is greater 
than twenty-two there will be too many items for the 
interpreter to display on the left side of the terminal 
screen at once. Unfortunately, the interpreter continues to 
try to display those items that will not fit on the screen. 


Since the interpreter is never able to display all items 


control of the terminal is never turned over to the 
programmer for a simulation run. The only way to stop the 
interpreter if this happens is to abort the run by typing 8 
Cont rol eae 

The interpreter uses information from three different 
locations to determine the values of all registers, ports; 
Signals, flags and processes during a Simulation run. The 
first location is the "console" where the programmer, using 
the terminal keyboard, can specify the values of the above 
items. The second location is the functional environment of 
the circuit, called the "chip'. This is where the 
interpreter uses input information from the programmer to 
determine the values of the above items. The last location 
is called the "environment" and is a programmer specified 
functional environment that the programmer may have the 
interpreter use during simulation (see the "e" command 


below). 


B. INTERPRETER COMMANDS 

All interpreter commands are screen oriented which 
means the command is executed as soon as the key is pressed 
and a carriage return is not necessary. Table VI gives a 
list of the interpreter commands. These commands can be 
displayed on the screen by typing "?". 

Most of the interpreter commands are self-explanatory 


but several require additional explanation. Several commands 
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TABLE VI 


MACPITTS INTERPRETER COMMAND SUMMARV 


? - This menu 
^L = Repaint screen 
p - Put interpreter state to <file-name>.int 
و‎ - Get interpreter state from <file-name>.int 
e — Enable/Dısable environment from (file-name).env 
c — Clock system (value) cvcle(s) 
l — Escape to Lisp system 
q - Quit 
J - Move cursor down 
K - Move cursor up 
G <tag> - Move cursor to <tag> 
t Set flag, input signal, or i/o signal Ко 
f — Set flag, input signal, or i/o signal to f 
s -— Set register, input port, or i/o port to (value) 
u — Set register, flag, input port, i/o port, 
input signal, or i/o signal to undefined 
T - Set i/o port or i/o signal to tri-state 
x = Clear «value»? register to О 
— — Negate «value»? register 
(digit) — Enter (digit) into (value) register 


affect only one item on the screen. Before these commands 
can be used the item to be affected must be highlighted by 


a ff 
2 


the inverse cursor. The "J "k" and "G" commands are used 
to move the cursor around the screen. If an adm3a terminal 
is used instead of a vt100 terminal the inverse cursor is 

not displayed and only the "G" command can be used to place 
the "invisible" cursor over the item to be affected. 

When the registers, ports, processes, flags and signals 
are initially displayed on the screen by the interpreter 
their values are undefined or tri-state if a tri-state port 
or signal is defined in the .mac file. (See Reference 2 for 
an explanation of the different register, port and signal 
types.) Before a simulation run is made all input and i/o 
ports and signals must be set to some initial value. The "t" 
and "f" commands are used to set input or i/o signals to 
true or false, respectively. The "s" command is used to set 
an input or i/o port to the value stored in the value 
register. Another command, the "T" command, can also affect 
the values of input or i/o ports and signals but has proven 
to be not very useful. If the "T" command is ነ on an 
input port or signal, or an i/o port or signal that is used 
for input only in the .mac program) theportmor 51822021 tn. 
will be set to a high impedance state (tri-state). The port 
or signal value will stay at high impedance until explicitly 
set to some value by the programmer using the "s", "t", or 


"f" commands. If an i/o port or signal is used for output 
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Only or both input and output in the .mac preġiam the 'T' 


command will cause the port or signal value to change to 


undefined. 
The የር command is used to simulate the functional 
environment of the chnipema lhe number of clock cycles 


Simulated in one simulation run is indicated by the value 
1551562: TT Oot I isistored in the value register only 
one clock cycle will be simulated. 

After a simulation run it may be desirable to store the 
values of all items displayed on the screen. This can be 
done by using the "p" command. The state of the functional 
environment is sayed in a file called filename.int where 
filename is the same as the filename.mac file. If more than 
one state is to be saved the programmer must login on 
another terminal and rename the .int file after each state 
is saved because each new state will be saved in the same 
"11 31]... 

The programmer also has the option of specifying the 
functional environment that the interpreter will use to 
simulate a particular .mac file [Ref. 2]. The "e" command 15 
used to enable/disable a functional environment stored in 
the filename.env file. There is no published information or 
documentation on the format of the functional environment 
in the .env file so this option has never been used at the 


Naval Postgraduate School. 
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C. INTERPRETER ERROR STATEMENTS 

As stated before, the purpose of performing an 
interpreter simulation is to check for syntax and logical 
errors in the .mac file” before a full chip design is made by 
the MacPitts compiler. Logical errors can be found by 
performing a simulation run and then comparing the results 
obtained to those expected. Reference 4, pages 47-49, shows 
a good example on how to perform a simulation of a .mac file 
using the MacPitts interpreter. 

Syntax errors in the .mac file are indicated in one of 
two ways by the compiler. First, if the error is severe 
enough the compiler stops the creation of the functional 
environment and displays an error message on the terminal 
screen that will give an indication of the syntax error. The 
compiler then returns the UNIX operating system back to 
the programmer. An example of a severe syntax error is an 
unequal number of open and closed parentheses in the .mac 
file. Less severe syntax errors usually do not show up 
until initial values are loaded into the input or i/o ports 
and signals or until a simulation run is performed. A short 
error message is then displayed on the command line of the 
terminal screen. 

There are over thirty different error messages that the 
compiler can display when a syntax error is found. The error 


messages and their meaning are as follows: 
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internprebtersonrompie Ehe jimterpretenitried to change 
the state (value) of a register but found the current 
state to be emptv (null), possessing no value. This 
error indicates an improper register definition or 
usage in the .mac file. 

Interprekeor errorsp ss same as-l above but for a flag. 
Interpreter Creor sot Same as 1 abeme bw forsa" port. 
Interpreter error 4: same as 1 above but for a signal. 
Interpretter erner 5: Unrecognizable functione Examples 
of some expected functions are setq, not, bit, call 

and if. See reference 2 for a listing of all MacPitts 
functions. 

Interpreter error G: the antecedent of an if statement 
is not t, f or undefined as required. 

Interpreter error 7: the interpreter tried to 
determine the state (value) of a register but found the 
current state to be empty (null), possessing no value. 
This error indicates an improper register definition 

or usage in the .mac file. 


Interpreter error 8: same as 7 above but for a flag. 


Interpreter error 9: same as 7 above but for a process. 
Interpreter error 10: same as error 7 above but for 
aport: 

Interpreter error 11: same as error 7 but for a 

signal. 
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TO: 
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Unrecognizable atomic form: an unknown alpha-numeric 
string is in the .mac file. Check for a missing 
definition or misspelled word. 

Process state out of bounds: the state (value) of 

a process is less than zero. Check the .mac file for 
a statement improperly setting a process to a value 
less than Zero. 

This process has too many returns: a return from a 
subroutine was encountered for which there was no 
previous call statement. Check the .mac file for the 
correct number of returns or for a missing call 
statement. 

This process has too many calls: a call to a 
subroutine was made but no return statement was found. 
Check .mac file for correct number of calls or for a 
missing return statement. 

Invalid bit selector: the bit selector in the data- 
path function "bit" is not between O and the bit size 
of the data-path as required. 

Too many arguments: all MacPitts functions require 
only one or two arguments. Check the .mac file and 
Reference 2. 

Too few arguments: see 17 above. 

A reset signal is needed: a reset signal has not been 
defined when the "process" form is used in the .mac 


file. 
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14. 


Lax 


ገር 


T 


Jos 


itə ə 


20. Domble signal (pomntb) setq;, chip vs. environment: 
the interpreter attempts to set a signal (port) to a 
value different than that assigned to that signal (port) 
by the functional environment from the .env file. 

1) Double signal (pont) setq, chip vs. console:;:. . the 
interpreter attempts to set a signal (port) to a value 
different than that assigned to that signal (port) by 
the programmer using the "s", "t" or "f" commands. 

22. Double signal (port) setq, environment vs. chip: 
the reverse of 20 above. 

23. Double register setq: two different setq statements 
in the .mac file attempt to assign a value to the 
Same register at the same time. 

24. Double process setq: same as 23 above but for a 
process. 

25. Double port setq: same as 23 above but for a port. 

26. Only one character per character-constant: this 
error indicates that an attempt was made to set the 
value of a constant to a character string longer than 
one character. “The value of a constant can be an 
1019861 or a Single character. If the value of a 
constant is set to a single character the ASCII 
equivalent of that character becomes the value of the 
constant. 

In addition to the above syntax error statements there 


are two svntax warning statements. These statements indicate 
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that there may be a syntax error and caution should be 
exercised during the simulation. These warning statements 
are: 

1. This process has undefined state: the interpreter 
has encountered a process in the functional environment 
whose state (value) is undefined. 

2. Antecedent of if is undefined: the interpreter has 
encountered a register, port, Signal, process, or flag 
in the functional environment being used as the 
antecedent of an if statement and whose value is 
undefined. 

The above two warning statements are common for pipeline 
design architectures. Initially the value of the ports, 
registers, processes, Signals, and flags of each stage of 
the pipeline are undefined and will stay undefined until 
data is clocked into and out of each stage. 

The MacPitts interpreter also displays error statements 
in the command line of the terminal screen if an interpreter 
command has been executed improperly by the programmer. 

The interpreter command error statements are: 

1. File not found: .int or .env file cannot be found. 

2. Cannot set this thing to value: only registers and 
ports can be set to value. 

3. Cannot set this thing to t, f: only signals or flags 


can be set to t or f. 
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Cannot set this thing to undefined: processes 
cannot be set to undefined. 

Cannot set ChEO Lu CO tri-state: only input or 
i/o ports and signals can be set to tri-state. 
Invalid command type ? for help: check interpreter 
command list for correct command. 

Cannot input from this port (signal): check for 
input or i/o port (signal). 

Cannot output to this port (signal): check for 


output lorii o port (signal). 
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